...

From Legacy Chaos to AI Confidence: Data Modernisation in Practice

Cristiano Valente
27 February 2026
Read: 5 min

Enterprise AI is moving from debate to reality. Most organisations use AI regularly in at least one business function, and AI tools are now a part of the daily routine of a large share of the global workforce.

Budgets have followed, with tens of billions spent on AI infrastructure and apps.

Have returns matched the investment? Not really.

Enterprises can only expect returns on AI investment if the AI can connect to the data and the workflows where value sits.

Legacy data systems, still widely used by enterprises, are a blocker here. In fact, integration is the main obstacle to effective AI adoption, and a significant share of AI project budgets is spent on technical debt instead of new capabilities. In this article, we will look at the modern data stack as a prerequisite for enterprise AI and explore the process of data modernisation in practice.

Why you cannot build AI on legacy data

You do not have the context

If you think about the average organisation, there are hundreds of apps in use. Yet, our experience shows that only a small fraction of these are actually integrated into a data platform.

AI needs data across systems to be useful. With data spread across disconnected apps, AI models operate with limited context, so their ability to act on real operational data is also limited.

Your data quality and availability are low

To implement AI at scale, you need high data quality and availability.
When transformation logic is scattered across hundreds or thousands of opaque mappings or scripts, you cannot maintain standardised definitions.
Reliability and availability suffer, and so does the trust in the outputs. After all, if your users do not trust the data, why would they trust the AI output based on it?

Technical debt consumes your budget and energy

A large portion of data budgets goes to maintaining legacy systems, and teams spend a substantial share of their time on fixing broken pipelines.
There is little to no capacity left for building new data capabilities that AI initiatives depend on anyway.

Vendor lock-in slows change

Centralised legacy platforms tend to become too large to change. Their structure often makes refactoring difficult, which encourages workarounds and postpones modernisation until the situation becomes untenable.

At the same time, end-of-life deadlines, such as the looming date when Informatica PowerCenter, enters a support-only mode, finally put migrations on the map. To capitalise on the sense of urgency, vendors offer cloud-based solutions, which, however, do not fully support AI implementation at scale in the long run.

Agentic AI comes with higher governance requirements

When agents can trigger decisions, governance requirements become stricter, as they call for:

  • Traceability
  • Policy enforcement
  • Reproducibility

If issues cannot be reproduced or the data that informed a decision cannot be traced, regulatory scrutiny becomes harder to manage.
Legacy platforms typically do not support these needs well, which delays AI implementation.

Getting AI-ready with dbt

1. Data quality through testing and documentation

AI readiness starts with trust in data. The common gaps we see across the scores of migration projects we have delivered include the absence of a testing framework and limited documentation practices.

The most efficient way to address these is to apply software engineering practices to analytics engineering.

This enables:

  • Built-in testing from the start
  • Documentation
  • Version control
  • Custom tests for business logic, not only numeric checks

This approach improves visibility into transformations and increases trust in the data layer that AI systems use.

2. Context through a semantic layer

Even with good data, inconsistent definitions across departments create confusion. For example, a metric like revenue can mean different things in accounting and marketing.

Without shared definitions, AI systems can return answers that do not match the intended meaning.

To address this, you need a semantic layer that enables a single source of truth for metrics, as well as unified, centralised definitions of metrics across the company.

This builds trust between technical and business teams and allows you to scale AI use cases without cross-department friction.

3. Code understanding and logic awareness with the Fusion Engine and LSP

AI readiness also depends on understanding transformation code and its intent. The dbt Fusion Engine and the Language Server Protocol (LSP) offer structured visibility into:

  • Models, tests, and documentation
  • SQL structure beyond plain text
  • Business logic awareness tied to how code runs and interacts with the data platform

Instead of copy-pasting code into a general-purpose model, this enables richer context for AI outputs, because the surrounding structure and logic are fully available.

data modernisation in practice

Migrating from legacy ETL to dbt with Flowline

Legacy ETL migrations are notorious for going awry, often exceeding budgets or timelines. What can improve predictability is a structured, automated approach that reduces risk.

Flowline is a packaged software and service solution that converts Informatica legacy ETL to dbt using deterministic code conversion.
AI agents then take the edge cases using context from our proprietary discovery phase. Finally, enterprise features such as dependency-aware sequencing and migration unit definition make every migration go without a bump.

The four-step migration process

Step 1: Automated conversion

First, legacy code is converted into dbt models using deterministic conversion. Typical automation reaches over 95%, accelerating time-to-value tenfold compared to a manual migration.

Step 2: Human review and reconciliation

Next, a highly-experienced dbt developer resolves remaining issues and runs full data reconciliation.

Thorough validation practices ensure that the tables produced by the new ETL match the legacy outputs row by row. This step is human-led and powered by automation with dbt packages and proprietary AI agents.

Step 3: Refactoring for maintainability, performance, and cost

A mere lift-and-shift recreates the same problems on a new platform.

To actually improve maintainability, performance, and cost while keeping outputs consistent with the original system, you need to refactor the code.
The step combines AI-assisted refactoring with human oversight, adapting procedural legacy patterns into SQL-oriented patterns.

Step 4: Training and handover

Making the most of cutting-edge capabilities takes a strong data culture.

Data teams familiar with legacy tools need to learn to maintain dbt-based implementations. To help them get up to speed with the new stack and make sure they can scale the set-up independently, Flowline includes a comprehensive training programme, fully tailored to the team and case.
At the same time, upon handover, business stakeholders get reconciliation reports, so they can have the full clarity on what has changed.

Deterministic conversion vs AI conversion

Pure AI-based conversion is unreliable for end-to-end migrations because large mappings and generated code often exceed practical context limits.

Additionally, conversion requires specific knowledge of both source and target systems, including the data platform. Non-deterministic outputs can vary across runs, increasing cost and reducing reliability.

To get precise, repeatable results, you need end-to-end validation and reconciliation. Deterministic conversion is the clear winner against AI conversion, as the former gives you the same result every time, allowing you to improve reliability and control costs.

Customising conversion to match internal standards

You can also customise the conversion process to align with internal standards and conventions. Python hooks allow you to run logic before or after specific conversion stages. Additionally, template-driven generation gives control over formatting, structure, and output patterns.

This approach enables alignment with project specific practices while respecting the constraints of both the source and target systems.

Modernisations made easy

Legacy data systems block enterprise AI through fragmentation, low data trust, technical debt, lock-in, and weak governance support for agentic AI.
AI readiness hinges on:

  • Tested, documented transformations
  • A semantic layer for shared metric definition
  • Deep understanding of the code

To reduce migration risks and support a governed layer designed for AI at scale, opt for a deterministic, validated migration path from legacy ETL to dbt.

Infinite Lambda’s end-to-end modernisation solution, Flowline, takes you from a legacy ETL system to an AI-ready modern data platform in weeks. To help you evaluate if you are ready to migrate, we offer a free assessment. Reach out and we will arrange it.

More on the topic

Everything we know, we are happy to share. Head to the blog to see how we leverage the tech.

Infinite Lambda achieves B Corp Certification
Infinite Lambda Achieves B Corp Certification
We are happy to announce that Infinite Lambda is now a certified B Corp. This achievement reflects the way we work, the choices we make,...
17 April 2026
Infinite Lambda is Fivetran Partner of the Year for Consulting, EMEA, 2026
Infinite Lambda named Fivetran Consulting Partner of the Year for EMEA (2026)
Infinite Lambda has been named Fivetran 2026 EMEA Partner of the Year for Consulting. This is our fourth recognition from Fivetran, highlighting our continued excellence...
24 March 2026
how to generate synthetic data with an LLM
How to generate synthetic data with an LLM
In this article, we will show you how to build a scalable, safe, and realistic synthetic data generation system. To do this, we will be...
27 February 2026
Informatica vs dbt
Informatica vs dbt: The Ultimate Comparison
After more than two decades of honourable service, Informatica PowerCenter is approaching end-of-support. During these years, on-premise, visual, point-and-click tools have also lost the battle...
26 February 2026
A better way of extracting information with LLMs – GRPO experiment
Extracting information with LLMs: GRPO to improve performance
Large language models are increasingly used for information extraction, where unstructured text needs to be converted into structured data for downstream systems. Looking to make...
25 February 2026
implementing data mesh with adlc
Implement Data Mesh: ADLC as the Great Enabler
Over the past few years, data mesh has attracted significant attention. Organisations adopt it to address slow delivery, central bottlenecks, and unclear ownership in their...
19 February 2026

Everything we know, we are happy to share. Head to the blog to see how we leverage the tech.