Blog Will GenAI Disrupt Data Engineering in Financial Services? The Answer Is More Nuanced Than You Think

Will GenAI Disrupt Data Engineering in Financial Services? The Answer Is More Nuanced Than You Think

April 21, 2026

Read Time: 8 minutes

Authored by: Dmitry (Mitya) Miller

Innovation & Tech

All Segments

Data engineering has already crossed the line where generative AI will handle a substantial portion of the code and configuration used in production. The impact has gone beyond light “no code” activities. Now, even data professionals are saying goodbye to writing SQL, Python, and hand-coding mappings, transformations, and legacy code conversions.

Coding assistants have proven that you can turn natural language into usable code. But the challenging part isn’t just code generation or low-code tooling. It is understanding what that code does, ensuring the output is correct, and identifying who is responsible when it isn't. So now the question is, which problems will this technology address next? In financial services, these problems always revolve around trust rather than code.

Managing multi-billion-dollar portfolios makes data complexity, governance needs, and quality standards essential due to issues like mispriced risk, incorrect P&L attribution, or regulatory exposure.

GenAI alone won’t disrupt data engineering

To understand the future of AI, it helps to look closely at the gap between impressive demos and actual applications for financial services workloads.

“When people talk about the challenges associated with automating analytical work, they often talk about making sure agents have the right data and context to answer questions correctly. The far bigger problem, however, seems to be that there’s no way to know if the work is right. You can’t click around a chart to see if it works like you can on a vibe-coded app. You can’t vouch for a spreadsheet without checking all the spreadsheet’s formulas. All you can do is either read through the code, line by tedious line, or recreate the whole thing yourself. And if you have to do that, what exactly are we automating here?” — benn.substackⁱ

One major flaw of generated code for data engineering or data analysis is the “validation” gap. With traditional software, you can test code against requirements and quality standards to see if it works. AI can automate both code and testing. But when an AI agent produces an analytics outcome from disparate data, how can you be sure it is correct? As Sridhar Ramaswamy, CEO of Snowflake, told Fast Company, “Coding agents tend to break down when they’re introduced to complex enterprise constraints like regulated data, fine-grained access controls, and audit requirements.”ⁱ

In many cases, the only way to test is to redo the work to track exactly how it obtained the data elements. The data analysis is often ad hoc/novel and not repeatable through verification tests. Speeding up code creation doesn’t automate the difficult parts.

For example, if an investment firm launches a new fund or strategy, it may need to onboard a new data vendor feed. Mapping new data formats to internal schemas used to take weeks. These days, work involves a mix of repetitive tasks that AI speeds up and complex tasks such as identifier conventions, timing cutoffs, corporate actions handling, and reconciliation rules that only show up once the data hits real positions. AI will generate a data pipeline very quickly (we know — we already do it in production every day), but it won’t remove the need for experts to design how the new data should be handled and validate the new data feed.

Similarly, when your custodian calls it "market value," and your admin calls it "fair value," and they mean slightly different things, consolidating the position data into a single book of record requires judgment, not just transformation. Post-acquisition integrations make this even more complicated. You have two data environments based on different assumptions, and no one fully understands either one.

Finally, migrating from legacy platforms reveals another mix of AI benefits and shortcomings. AI can parse and translate the code, but not the assumptions baked into it. Stored procedures might be expressed in code, but they don’t include internal knowledge about code written a decade ago without documentation.

The common theme in these examples is that AI speeds up the creation of artifacts, but not issues like semantic disagreements between systems, unclear business logic, or data quality problems upstream. To be confident in analytical outcomes, you need to address all these issues. The real bottleneck is trust.

From transformation to trust

Looking deeper into the data lifecycle shows where the trust bottleneck occurs. These are the next wave of challenges that AI must automate credibly to make deeper advances. In each case, AI agents can become core elements of data engineering tooling. They are generating data pipelines and conversational interfaces that help users query positions, detect anomalies, and increase their productivity in managing data and reporting from it. These gains are well positioned to accelerate.

Data quality and observability represent a major area where AI has made genuine improvements. Automated removal of duplicates, standardization, and filling in gaps can reduce the need for manual cleanup. Root cause analysis agents can identify issues more quickly. However, the larger challenge is operational accountability.

When a bad price slips through or two systems start showing different numbers, an alert is only helpful if someone has already determined who will investigate, what level of tolerance is acceptable, and what authority they have to fix it. Otherwise, you simply receive a quicker notification of a problem that still takes just as long to resolve.

Data discovery and documentation also benefit from natural language interfaces that truly lower access barriers. A portfolio manager can ask a question without having to write SQL or know which table to query. But the bigger problem is that the answer appears definitive while the assumptions behind it remain hidden.

A similar consideration exists with lineage parsing from SQL logs. While natural language search over data assets is measurably better than it was two years ago, the more complex problem is defensibility under pressure. The stakes get higher when an auditor asks how a reported number was derived or a regulator requests evidence of a calculation's inputs.

When someone asks, "What's my net exposure to China?" the interface provides a number. However, it can be unclear about which positions are included, which legal entities matter, and as of what time the data is relevant. The underlying data may already be filtered based on permissions the user cannot see, leading people to act on information they do not fully understand. Or it may show which data tables led to a calculation, but it won't reveal the business decisions or manual overrides that triggered data changes. AI can track data flows, but it can’t reconstruct the reasoning behind them.

In each case, AI absolutely provides useful output for the user. But it’s only the first step in the process. Firms need definitions, governance, and contextual rules in place first to make sure that output can be trusted and valid. Closing that gap is essential for “cool demos” to rise to the level of production usage.

The need for context

This is where the conversation about AI in data engineering gets interesting. What matters isn’t whether the technology has plateaued, but whether the leading edge is shifting toward context and meaning. The same position data means something different to a portfolio manager sizing a trade, an operations team settling it, a risk officer stress-testing it, or a compliance analyst reporting it.

Each context comes with its own assumptions about the data, including cutoff times, inclusion rules, and aggregation logic. This leads to different actions. AI can transfer data between systems faster than before, but it can't change the fact that "cash position" has different meanings for a trader (buying power), an operations team (settled funds), and a treasury function (liquidity available for redemptions). No amount of code generation can help if the requirements aren't clearly defined with context in mind.

Solving these problems represents the leading edge of AI development in data engineering. The real disruption will come from companies that recognize the operational implications of every field, every cutoff, and every implied action, encode that domain expertise into their platforms instead of adding it later.

A practical checklist for post-trade automation and AI

View the checklist

Authored By

Dmitry (Mitya) Miller

Dmitry (Mitya) Miller is the Managing Director, General Manager for Aquata, Arcesium’s comprehensive self-service data platform purpose built for the investment management industry. Mitya is responsible for overseeing all aspects of the Aquata business, including P&L ownership, customer base growth, customer delivery and engagement, and product roadmap.

View Author Profile

Share This post

Sources:

[i] benn.substack, 2025. https://benn.substack.com/p/can-analysis-ever-be-automated

[ii] Fast Company, 2026. https://www.fastcompany.com/91487016/snowflake-thinks-ai-coding-agents-are-solving-the-wrong-problem

Subscribe Today

No spam. Just the latest releases and tips, interesting articles, and exclusive interviews in your inbox every week.

Our Platforms

Who We Serve

Resources

About Us

Will GenAI Disrupt Data Engineering in Financial Services? The Answer Is More Nuanced Than You Think

GenAI alone won’t disrupt data engineering

From transformation to trust

The need for context

A practical checklist for post-trade automation and AI

Subscribe Today

Subscribe Today