Hey, I'm Philip, senior AI relations engineer at Google DeepMind. My days revolve around making our models more accessible to developers, helping you build applications, chatbots, and agents with Gemini. But here's what struck me during my recent talk: when I asked who had chatbots in production, hands shot up everywhere. When I asked about agents? The room got quiet.

That gap tells us something important. We're at this fascinating turning point where AI is shifting from answering questions to actually doing things. And if you're still thinking agents are just fancy chatbots, well, let me share what I've learned about where we're really headed.

The evolution from text completion to autonomous action

Remember when LLMs first hit the scene? They were basically sophisticated autocomplete tools. You'd start with "my" and the model would predict "name", then keep going until it had a complete sentence. Cool party trick, sure. But not exactly revolutionary for business applications.

The real shift happened when we realized these models needed to follow instructions, not just complete text. Think about it: if you ask "What's the capital of France?" and your model responds with "What's the capital of Germany?", that's technically good text completion, but completely useless for actual work.

So, we taught these models to be instruction-following. That was step one. Then came the chatbot interfaces (hello, ChatGPT), making these systems conversational and user-friendly. But the game really changed when we introduced function calling; suddenly, our models could reach out and interact with external services.

Now? We're entering the agent era. Instead of back-and-forth conversations, we give the model a goal and let it run. It decides what tools to use, when to use them, and keeps working until the job's done. No hand-holding required.

A brief history lesson: how we got here

The first real AI agent was probably WebGPT from OpenAI. They literally sat humans down, recorded them browsing the web, every search, every click, every extraction of information, then trained GPT-3 on that dataset. Suddenly, GPT could browse the web using actions like search, browse, and click.

Then Meta dropped Toolformer, which taught LLMs to recognize when they needed external help. Ask a knowledge question? The model learned to search Wikipedia. Need to solve a math problem? It reached for a calculator.

For expert advice like this straight to your inbox every other Friday, sign up for Pro+ membership.

You'll also get access to 300+ hours of exclusive video content, a complimentary Summit ticket, and so much more.

So, what are you waiting for?

Get Pro+