The latest research from OpenAI and collaborating institutions reveals something unexpected about GPT-5: it's beginning to act less like a sophisticated search engine and more like that brilliant colleague who helps you crack tough problems.

In a comprehensive study spanning mathematics, physics, biology, and computer science, researchers documented cases where the AI model didn't just retrieve information. It generated novel proofs, identified hidden connections between disparate fields, and compressed months of theoretical work into hours.

From tool to thinking partner

Here's what caught my attention: GPT-5 solved four previously unsolved mathematical problems. Not approximated, not suggested approaches; actually solved them.

One of these, Erdős Problem #848, had stumped mathematicians for decades. The AI's contribution? A stability-style analysis that human mathematicians had overlooked, sandwiched between layers of human insight.

But let's pump the brakes for a second. This isn't about AI replacing scientists. Timothy Gowers, a Fields Medalist involved in the study, put it perfectly when he compared GPT-5's contributions to those of a knowledgeable research supervisor: helpful, sometimes insightful, but not yet at the level where you'd list them as a co-author on most papers.

The real magic happens in what is called the "compression factor." Brian Spears from Lawrence Livermore National Laboratory used GPT-5 to model thermonuclear burn propagation in fusion experiments.

Six hours of collaborative work with the AI accomplished what he estimated would have taken six person-months with a team of postdocs. That's not just efficiency; that's a fundamental shift in how research might be conducted.

Is ChatGPT the new Google?
Can ChatGPT revolutionize or transform search engines? Could ChatGPT or another AGI tool be that tiny Google?

The literature search revolution

Perhaps the most immediately practical application emerges from GPT-5's ability to perform what researchers call "deep literature search." This goes way beyond keyword matching.

The model identified that a new result in density estimation was mathematically equivalent to work on "approximate Pareto sets" in multi-objective optimization; a connection the human authors had completely missed because the fields use entirely different terminology.

In another striking example, GPT-5 located solutions to 10 Erdős problems previously marked as "open," including papers in German from decades ago. The model even found a solution hidden in a brief side comment between two theorems in a 1961 paper, something that had been overlooked by human reviewers for over 60 years.

Where human expertise remains essential

The research also illuminates crucial limitations. Derya Unutmaz's immunology experiments showcase both the promise and the peril.

GPT-5 correctly identified that 2-deoxy-D-glucose was interfering with N-linked glycosylation rather than just glycolysis in T-cells, a mechanistic insight the research team had missed despite deep expertise in the field. Yet the model also required constant human oversight to catch overconfident assertions and flawed reasoning.

Christian Coester's work on online algorithms demonstrates another pattern: GPT-5 excels at specific, well-defined subproblems but struggles with open-ended theoretical questions.

When asked to prove or disprove that a particular algorithm could achieve a certain performance bound, it produced an elegant counterexample using the Chevalley-Warning theorem. But when pushing for more general results, it often generated flawed arguments that required human correction.

The scaffolding effect

A fascinating pattern emerged across disciplines: GPT-5 performs dramatically better when properly "scaffolded." Alex Lupsasca discovered this when the model initially failed to find symmetries in black hole equations.

But after working through a simpler flat-space problem first, GPT-5 successfully derived the complex curved-space symmetries, reproducing months of human work in minutes.

This scaffolding requirement reveals something fundamental about current AI capabilities. These models possess vast knowledge and computational power, but they need human expertise to direct that capability effectively.

It's like having access to a Formula 1 engine; immensely powerful, but you still need to know how to build the rest of the car and drive it.

How to use GPT-4o mini to build AI applications (10 tips)
In a major move towards making artificial intelligence more accessible, OpenAI has unveiled GPT-4o mini, its “most affordable and intelligent small model” to date.

A cautionary tale

Not all stories in the research are triumphant. Venkatesan Guruswami and Parikshit Gopalan's experience with "clique-avoiding codes" serves as a crucial warning.

GPT-5 provided a correct proof for a problem they'd been curious about for years. Excitement turned to embarrassment when they discovered the exact same proof had been published three years earlier.

The AI had essentially plagiarized without realizing it, highlighting a critical challenge for AI-assisted research: ensuring proper attribution when the model might not always identify its sources.

What this means for AI professionals

For those of us working in AI, these findings suggest we're at an inflection point. GPT-5 isn't just a better GPT-4; it represents a qualitative shift in capability. But perhaps more importantly, it reveals that the path forward isn't about replacing human intelligence but about creating new forms of human-AI collaboration.

The researchers repeatedly emphasized that using GPT-5 effectively requires deep domain expertise. You need to know when the model is hallucinating, when to push back on its assertions, and how to scaffold problems appropriately. In essence, the better you are at your field, the more value you can extract from these AI collaborators.

As we move forward, the question is how we'll adapt our workflows, our attribution systems, and our understanding of creativity itself to accommodate these new collaborators.

If these early experiments are any indication, the future of science might look less like humans versus machines and more like the best of both, working in tandem to push the boundaries of knowledge.