Analytics engineering’s AI gap: Full-stack data perspective

From the onset, analytics engineering seemed like the final evolution phase of business intelligence.

It created order through modern warehouses and declarative modeling tools. Dashboards became trusted, lineage was visible, and metrics were version-controlled.

Over time, it gained traction through communities like dbt Labs, where it was established as a discipline focused on documentation, reproducibility, and testing best practices.

💡

For many, analytics engineering felt closer to software engineering than traditional reporting.

With the arrival of Artificial Intelligence, the focus shifted from describing the past to predicting the future.

Systems began generating new outputs and working with probabilities rather than fixed answers. Instead of deterministic SQL queries, teams began working with uncertainty. Compared to the clean, predictable nature of dashboards, AI systems feel fundamentally different.

This was the turning point.

Analytics engineering prepared me to build reliable reports. AI requires building intelligent systems. That shift demands a full-stack mindset.

Analytics engineering foundations: What we were trained to optimize

The data warehousing tradition was the foundation of analytics engineering.

We learned to prioritize clarity of structure and dimensional modelling, drawing from texts like The Data Warehouse Toolkit. The goal was consistency and trust, where the same SQL query would always produce the same result.

This determinism became the basis of stakeholder confidence. It worked because it created stability and shared understanding across teams.

However, it also introduced a set of assumptions:

Transformations are rigid
Answers are exact
The world is structured

AI challenges each of these assumptions.

The mindset gap: Deterministic pipelines vs probabilistic systems

Machine learning operates on probability, while analytics engineering is built on certainty.

💡

A dashboard might report revenue as $1.2m. A model, on the other hand, might predict a 72% probability that a customer will churn. One is definitive, the other is contextual.

Research from Harvard Business Review reinforces this shift. Thomas H. Davenport and Rajeev Ronanki explain that successful AI systems deliver value within constraints, with usefulness taking priority over perfection.

This reframes what “correct” means.

Instead of asking whether something is correct, teams focus on:

Performance improvement
Comparison to a baseline
Value delivered to users

As a result, fixed validation gives way to experimentation. Metrics become distributions rather than absolutes, and progress is measured iteratively. For engineers used to deterministic systems, this shift can feel unfamiliar, yet it becomes essential.

The data problem gets harder: Messy inputs, drift, and continuous quality

AI introduces a level of complexity that structured analytics rarely encounters.

Data extends beyond clean tables to include logs, images, and unstructured text, all of which require ongoing interpretation and engineering. This data also evolves over time.

In traditional analytics, issues like null values or broken schemas were visible and relatively easy to diagnose. In AI systems, challenges emerge more subtly. Models can degrade while systems appear to function as expected.

Distributions shift.

User behavior evolves.

Language changes.

Simple checks such as row counts provide limited coverage in this context.

Modern AI systems require continuous monitoring and active data management. As Bernard Marr highlights, value from AI comes from actively governed data.

Data quality becomes an ongoing responsibility.

New responsibilities: From transformation to models and MLOps

In analytics, pipelines end at insight. In AI, they extend to action.

Dashboards support human decision-making. Models automate decisions.

This shift introduces a new set of responsibilities:

Model deployment and rollback
Training and evaluation
Monitoring predictions in production
Ensuring training consistency

The lifecycle becomes continuous rather than static.

Guidance from Google formalizes this approach under MLOps, where models are treated as production systems.

Frameworks like the ML Test Score, developed by Eric Breck, provide structured ways to assess readiness and manage risk.

The risks are well documented. D. Sculley shows how quickly complexity builds in machine learning systems when pipelines are fragile or loosely defined.

Over time, shortcuts accumulate and systems become unstable.

ML systems are engineering problems, as well as data challenges.

The full-stack reality: Infrastructure, product, and human trust

As soon as models are embedded into applications, the scope expands.

💡

Concerns such as latency, cost, and scalability become central. Real-time systems and APIs become part of the data workflow. At this point, work extends beyond reporting into product development.

Trust also becomes a defining factor.

Dashboards present verifiable numbers. Models make decisions that impact users. This introduces new expectations around transparency, bias, and accountability.

Users want explanations. Regulators expect oversight.

Trust becomes something that is designed, measured, and maintained alongside technical performance.

Conclusion

Analytics engineering provided strong foundations in lineage, reproducibility, testing, and discipline.

AI builds on these foundations while introducing uncertainty, continuous change, and new system-level challenges.

The boundaries between engineering, analytics, and product continue to converge. Data professionals increasingly think across the full stack, from data models to real-world impact.

The goal is to extend analytics engineering.

From clean dashboards to intelligent systems. From static pipelines to adaptive ones.

This is the shift AI demands, and it highlights the gap that analytics engineering alone did not fully address.

References

Eric Breck, Polyzotis, N., Roy, S., Whang, S., & Zinkevich, M. (2017). The ML Test Score: A Rubric for ML Production Readiness.
Thomas H. Davenport, & Rajeev Ronanki (2018). Artificial Intelligence for the Real World. Harvard Business Review.
Google (2020). MLOps: Continuous Delivery and Automation Pipelines in Machine Learning.
Kimball, R., & Ross, M. (2013). The Data Warehouse Toolkit. Wiley.
Martin Kleppmann (2017). Designing Data-Intensive Applications. O’Reilly Media.
Bernard Marr (2021). Data Strategy: How to Profit from a World of Big Data, Analytics and AI. Kogan Page.
D. Sculley et al. (2015). Hidden Technical Debt in Machine Learning Systems. In Neural Information Processing Systems Proceedings.

What analytics engineering didn’t prepare me for in AI