Why the data layer determines whether AI pays off in capital markets

Paul Villena Chief Technology Officer, RoZetta Technology

In this article

Why the data layer is the primary bottleneck in capital markets AI deployments
The four criteria that define truly AI-ready data
How an AI-native data layer gives human analysts and AI agents a shared foundation

In capital markets, AI has moved "from 'cool to core'" — from proofs of concept into live deployments across investment research, trading, and risk workflows. In fact, 95% of wealth and asset management firms have now scaled AI to multiple use cases. The models are capable. The compute is available.

Yet Gartner predicts that through 2026, organizations will abandon 60% of AI projects unsupported by AI-ready data. The bottleneck is not AI capability. It is the data infrastructure feeding it.

Capital markets AI projects that stall share a common characteristic: the data they depend on does not yet meet the four requirements that AI-ready data must satisfy simultaneously.

The four requirements for AI-ready data in capital markets

The industry definition of AI-ready data is often imprecise. For many firms, it means clean data stored in a data lake. In practice, that is a starting point, not the destination. AI-ready data in capital markets must satisfy four criteria simultaneously: it must be discoverable, contextualized, governed, and licensed for AI use.

Discoverable

An AI agent — or the analyst configuring it — can find the right data product without knowing its technical location. If your pricing data spans three platforms and seven schemas, and the only reliable path to locating it runs through the data engineering team, that data is not discoverable by AI.

Contextualized

Without contextual metadata covering what a data product represents, how it was sourced, what its data quality characteristics are, and what caveats apply, AI models cannot distinguish reliable data from unreliable data. They treat both the same way.

Governed

Access controls must be enforced programmatically. When an AI agent queries a data product, the system needs to verify that the requesting user — or the agent acting on their behalf — holds the appropriate entitlements. Only 6% of organizations fully trust their AI agents to operate autonomously — a number that lays bare the gap between enthusiasm for artificial intelligence and confidence in turning it loose on the most critical workflows. Robust, automated governance is what bridges that gap incrementally.

Licensed for AI use

Market data licenses were designed for human consumption. As AI moves into production across capital markets workflows, major exchanges and data vendors are actively revisiting their licensing frameworks to address AI and generative AI use cases. The standards are still evolving, and there is no settled industry consensus yet. Firms that feed licensed data into AI models without first verifying AI/ML usage rights carry meaningful audit exposure.

AI-ready data in capital markets must satisfy four criteria simultaneously: discoverable, contextualized, governed, and licensed for AI use.

The four AI-ready pillars: how does your firm measure up?

Before assessing your data infrastructure, it helps to anchor each pillar to a concrete question. If you're unsure of the answer to any of these, your firm has a gap worth addressing.

Pillar	Ask yourself…
Discoverable	Can an AI agent — or the analyst configuring it — find your pricing, reference, or risk data without having to ask your engineering team?
Contextualized	Does every data product your AI touches carry provenance, data quality scores, and vendor caveats it can read and act on automatically?
Governed	When an AI agent queries a data product on behalf of an analyst, does it inherit the right access controls and entitlements — automatically, without a manual check?
Licensed for AI	Can your systems verify AI/ML usage rights before every data access — without a manual licensing review or a call to your legal team?

A self-assessment across the four AI-ready data pillars.

Why the gap persists

Most firms have invested in individual components of this stack. A data catalog addresses discoverability. A governance platform manages access. A compliance team manages licensing. The challenge is integration. These systems were not designed to work together, and AI agents cannot bridge that gap on their own.

An AI agent that discovers a data product through one system has no mechanism to verify its data quality context in another system or confirm its licensing status in a third. The agent is structurally dependent on human intermediaries to fill those gaps. An agent executing a multi-step research workflow needs to discover, evaluate, verify permissions, confirm licensing, and access data in a single continuous operation. When any step requires a separate login or a human handoff, the automated workflow breaks down.

This fragmentation is precisely the infrastructure pattern behind Gartner's forecast. The window closes at the end of 2026 — we are inside it now. The AI capability is present. The integrated data layer is not.

The challenge is not individual components. It is that discoverability, context, governance, and licensing operate in separate systems. AI agents need all four within a single continuous workflow.

The AI infrastructure layer: connecting the four pillars

An AI-native data infrastructure layer that unifies discoverability, context, governance, and licensing in a single interface is the architectural response to this gap. The current leading technical standard for how AI agents interact with enterprise systems is Model Context Protocol (MCP) — at least for now, as the protocol landscape continues to evolve.

Think of MCP as a universal adapter for AI: just as a universal power adapter lets one device connect to any socket worldwide, MCP gives AI agents a single, standardized way to connect to any enterprise data system. Developed by Anthropic, MCP has rapidly become the emerging standard for AI agent connectivity. Capital markets firms are already seeing results: Bridgewater's AIA Labs built their Investment Analyst Assistant on Claude, enabling analysts to generate Python code, create data visualizations, and work through complex financial analysis tasks at speed. This is not a future architecture — it is already in production at leading capital markets firms.

When an AI agent accesses institutional data through an AI-native catalog, it receives the same engineer's notes, data quality caveats, and usage signals that a 10-year veteran would know intuitively. No subject matter expert needs to be consulted for every query. This is the institutional data knowledge that most firms currently hold in people rather than systems.

Firms that unify these four pillars in a single infrastructure layer are able to deploy AI at research velocity. Projects move from model validation to production in weeks rather than quarters, with confidence that the data their AI is working from is discoverable, contextualized, compliant, and correctly licensed.

Building the foundation before the model

The most productive shift a capital markets firm can make is to assess data infrastructure readiness before committing to AI model selection. McKinsey's November 2025 State of AI report paints a similarly sobering picture: just 6% of organizations qualify as true "AI high performers" — those attributing meaningful EBIT impact to AI. The differentiator is not model sophistication. It is whether the underlying data infrastructure enables AI to operate reliably at scale.

The firms that get this right are not necessarily those with the most advanced models. They are those whose data infrastructure makes every data product accessible, contextualized, and correctly licensed for AI use — by both human analysts and AI agents working in parallel.

This is a solvable infrastructure problem. It does not require replacing existing platforms or migrating data. It requires a metadata layer that connects them and surfaces the institutional data knowledge those platforms already hold.

The firms that get AI right are those whose data infrastructure makes every data product accessible and trustworthy, for both the analyst and the agent.

How DataHex Data Library enables AI-ready data

This is the problem DataHex was designed to solve. DataHex Data Library is an AI-native business data catalog built for capital markets, designed so that AI agents consume the same institutional data knowledge, data quality signals, and licensing metadata that human analysts use.

AI-ready pillar	DataHex capability	Outcome for front office and AI agents
Discoverable	Vendor documentation ingested automatically via RAG. Your entire data estate, searchable by business concept.	Find any data product in seconds. No schema knowledge or engineering support needed.
Contextualized	Every data product automatically enriched with provenance, data quality scores, vendor caveats, and usage history.	Context built in, not bolted on. AI acts on data it can trust.
Governed	Access controls unified across your existing platforms. Agents inherit the same entitlements as the humans they work for.	One governance framework for humans and agents alike. Audit-ready by default.
Licensed for AI	Licensing terms encoded as machine-readable rules. Agents verify AI/ML usage rights automatically before every access.	Licensing is verified automatically before every data access. No surprises, no audit exposure.

DataHex Data Library capabilities mapped to the four AI-ready pillars.

DataHex Data Library operates as a lightweight metadata layer with no data migration and no infrastructure overhaul required. Unlike building data intelligence capabilities in-house — which typically involves lengthy development cycles and high failure rates — DataHex deploys on top of your existing platforms. Your data estate is already there; DataHex connects it, contextualizes it, and makes it AI-ready. Rapid deployment, purpose-built platform, faster time to value.

See how DataHex turns your existing data estate into AI-ready infrastructure

Request a tailored walkthrough