Building AI Agents for Investment Operations: A Step-by-Step Success Journey
Imagine you’re considering how to layer AI agents into investment operations, with a long-term vision of an agentic middle office. Nearly three-fourths (73%) of asset management industry executives say AI is critical to their organization’s future, according to Grant Thornton.i
The question is how to get there. The market offers plenty of compelling capabilities. You can find model sophistication with pure-play AI providers, connectivity and scale on top-tier data infrastructure platforms, and domain expertise offered by industry specialists.
But you still need a way to bring all of that together. The value you can achieve depends on the realities of your investment operations. Regulatory overlaps, fiduciary obligations, multi-stakeholder processes, and data ontology all affect how you implement.
When capabilities meet constraints, we consistently see the same pattern. Clients are telling us they are advancing fastest with AI agents when they apply discipline. They move quickly and carefully at the same time, with four common underlying design principles.
Focus on one workflow, one outcome, one owner
Mandates like “use AI to improve operations” simply can’t get traction. An effective agent initiative begins by putting clear boundaries around the problem you’re trying to address by mapping each workflow to a measurable outcome and an owner.
For example, a fund controller at a multi-strategy firm reviews NAV figures every morning. Simple automation could detect issues like material P&L swings, large changes in expense accruals, and valuation inputs that look unusual. An agent could propose explanations for anomalies and recommend a course of action for the controller or delegate to review and action.
When an agent’s scope is bounded, the controller has traceability of the full lineage of a pricing discrepancy, giving them the confidence to resolve an issue without needing to validate everything else the process touches.
As bounded agents prove reliable, you could then look to new agents for adjacent parts of the workflow, such as pricing updates from brokers, each held to the same standard of one workflow, one outcome, one owner. The controller’s morning review might eventually see end-to-end coverage by agents, but each one earns its place individually.
Use business outcomes to set the bar
Measuring the number of AI workloads in production might seem like the right metric for value, but workload counts almost always paint a misleading picture. The number of workloads tells you nothing about whether the agent improved anyone’s work or created operational scale.
That’s why developing an agent should start with a definition of expected business impact. When you agree on a small set of success measures and a go-live threshold in advance, both in terms of output quality and expected efficiency gains, you have a shared standard for what “working” means. Without that clarity, you don’t have a mechanism for deciding when to stop, deploy, or scale it.
Consider ad hoc analysis of investment performance and returns. Today, finding out why a particular strategy underperformed its benchmark over the last quarter takes PM teams hours to pull data from multiple systems, reconcile across sources, and build manual analyses.
An agent grounded in an authoritative data foundation could handle this in steps. It could pull performance attribution broken down by sector and factor exposures and cross-reference recent market and position changes. From there, it could produce a specific analysis with a corresponding confidence score. Maybe the strategy’s emerging markets tilt detracted 140 basis points as EM equities sold off, compounded by a currency hedge that expired two weeks before quarter-end. Delivering that analysis in minutes helps the PM adjust and move on to new challenges.
If you establish business outcome metrics rather than processing time to measure performance before the agent goes live, you have the data you need to evaluate whether it stays in production.
Architect a domain-aware data foundation and orchestration layer
The most dangerous AI output looks right but references wrong or incomplete information. If your conversational interfaces give you answers based on stale data or competing versions of the truth, you’ve created operational risk. And you’ve exposed yourself to fiduciary and regulatory consequences.
Building reliability starts by making sure agents use a set of definitive sources. Each one should have clear ownership, version control, and an audit trail that records what the agent accessed and when.
Investment operations workflows don’t run on generic logic. Settlement instructions, corporate action elections, reconciliation breaks, and NAV calculations each follow distinct rules, sequencing requirements, and tolerance thresholds that vary by asset class, counterparty, and fund structure. An agent operating in this environment must combine access to clean data with an understanding of the domain it’s working in.
Orchestration layers route tasks to the right specialized capability and combine their outputs. For example, orchestration helps a reconciliation agent know the difference between a timing break and a genuine discrepancy, helps a corporate actions agent understand the election hierarchy for a particular fund, and helps a NAV agent sequence dependencies correctly before triggering a calculation.
When that orchestration is domain-aware, agents can act with confidence rather than surfacing every edge case for human review. It empowers them to build on top of it rather than cleaning up after it. After your data and orchestration foundation is in place, federating agent development to subject-matter experts is where the impact happens.
In late 2022, Balyasny Asset Management established an Applied AI team: a centralized group of 20 researchers, engineers, and domain experts tasked with building AI-native tools that embed directly into team-level workflows.
Balyasny built the evaluation pipeline and data governance before deploying across roughly 95% of the firm’s 180 investment teams. If you’re evaluating your own readiness to deploy agents at scale, the question to ask is whether your data foundation would hold up under the same scrutiny.
Build for autonomy in stages
The most expensive mistakes happen when you give agents too much autonomy. An agent that recommends a position adjustment carries different risk than an agent that executes one with settlement finality and client-visible outcomes.
Start by letting the agent observe. It watches the workflow, flags what it sees, and you evaluate whether the flags are useful. Once you trust the observation, let the agent recommend a course of action. You still make the call.
After enough accurate recommendations, you can let the agent act with your approval before execution. Full automation, where the agent acts without a human checkpoint, only makes sense for workflows where the cost of a wrong action is low, and the agent has a long track record in the earlier stages.
Consider a portfolio manager rebalancing a multi-asset portfolio. At the first stage of the workflow, the agent could monitor drift from target allocations and flag when a portfolio exceeds tolerance. The agent’s output leaves the decision on how to proceed with the PM.
At the next stage, the agent would propose a rebalance with specific trades, sized and sequenced, accounting for tax lots, transaction costs, and restricted securities. The PM reviews the proposed trades and approves or modifies before routing the orders for execution.
In a fully autonomous workflow, you might let the agent execute within predefined parameters, such as no single trade above a set notional and only during normal market hours, with a PM reviewing after the fact. An audit trail would give compliance officers, client reporting teams, and operations staff a complete record of what the agent did and why. This represents the next stage of AI evolution, but we’re not seeing firms use it in production yet.
Each stage of proven reliability gives you confidence to move the agentic component downstream in the workflow. With that record in place, a promising pilot becomes a permanent part of your infrastructure.
From prototype to production
Given the speed of agentic AI deployment and innovation, trying to follow the technology feature by feature is a lost cause. A principles-based approach lets you avoid confusion, paralysis, and drift. Applying all four of these principles to every initiative gives each agent the best chance of reaching production.
Authored By
Dmitry (Mitya) Miller
Dmitry (Mitya) Miller is the Managing Director, General Manager for Aquata, Arcesium’s comprehensive self-service data platform purpose built for the investment management industry. Mitya is responsible for overseeing all aspects of the Aquata business, including P&L ownership, customer base growth, customer delivery and engagement, and product roadmap.
Share This post
[i] Grant Thornton, 2025. https://www.grantthornton.com/insights/articles/asset-management/2025/ai-is-transforming-asset-management
[ii] OpenAI, 2026. https://openai.com/index/balyasny-asset-management/