Raw to Refined: A Framework for Data Sourcing

October 9, 2023
Read Time: 6 minutes
Data Management

It’s no secret that firms with more usable data tend to perform better than those with less. Yet, with so many error-prone elements, where should firms start?

Effective data sourcing – from choosing the right provider to taking command of internal proprietary data and integrating data for business use – is critical to a successful data initiative. According to Gartner’s Hype Cycle for Data Management, 2022 report,[1] data discovery and management are somewhere between the ‘Peak of Inflated Expectations’ and the ‘Trough of Disillusionment’. In the firm’s yearly research to explore the maturity, adoption, and social application of specific technologies, the market researcher found that firms are well aware of the need for data programs. However, many face challenges in implementing and realizing results from their strategy.

This concern bears out for many financial institutions investing in multi-million-dollar teams to build their data strategy program and source the right data for their business objectives. It’s estimated that global firms are collectively managing a wallet of over $37 billion dollars.[2]

In a three-part blog series, we’re examining firms’ data strategy challenges and considerations to evaluate at each stage of the institutional investment data value chain. For the second article of our data-focused series, our professionals explore data sourcing challenges and what to evaluate to create meaningful insights.

Begin with the End in Mind

When thinking about data discovery, it might seem logical to start with the questions, “What data do I need and where am I going to get it from?”

But taking a step back, one key to data discovery and sourcing success is starting with the desired business outcomes. Firms must determine the outputs they want before they decide what data inputs will best facilitate quality results. For firms that follow this process, a mistake many make is not getting granular enough about their anticipated outcomes. It’s like deciding to bake a cake but not determining what kind of cake before you buy the ingredients. You may end up missing an ingredient, buying more than you need, or not buying the right quality ingredients for the recipe.

So, what are the initial questions firms should consider in building their data sourcing strategy:

  • Who are the stakeholders that will consume the outputs from front to back?
  • What processes will the data be an input for?
  • Will the data be used internally and/or externally?
  • Who will be responsible for integrating and managing the data?
  • What other systems will the data interact with?
  • What specific reporting and analytics requirements do you want to fulfill?

Harness Internal Data

Internal data is an underutilized asset across enterprises today. Many firms don’t realize they own first-party data as they generate and observe proprietary data exhaust during their normal course of business.

A comprehensive catalog of existing internal data assets is the foundation of an external data sourcing strategy. However, internal data doesn’t come neatly packaged with marketing collateral, schemas, and samples that clearly define the product and access method. It takes a dedicated, collaborative effort from internal teams to understand, organize, and unify internal data. A firm’s data footprint consists of internally built and externally leveraged tools in a structured or unstructured format.

Examples of a First-Party Data Footprint

Test image

Beyond identifying and cataloging data assets, firms must be ready to stage data for consumption and productize insights. To create staged, consumable data for downstream users, firms are investing in data aggregation to a central platform, processes to structure and link datasets to each other or to industry identifiers, and data normalization. Well-staged data – a key driver of the business flywheel – facilitates cross-functional collaboration and accelerates the productization of insights for both internal and external audiences.

There are two primary differentiators of firms that successfully harness first-party data to generate insights, manage risk, and gain operational efficiencies:

  • 1.The more proprietary and unique the data a firm commands, the more differentiated and valuable its insights are relative to the rest of the market.
  • 2.Firms that can aggregate more internal data into a centralized platform are better equipped for advanced data science projects. This is because they can generate meaningful features on their existing operating footprint.

Establish a Robust Provider and Product Selection Framework

Another area firms tend to stumble over is selecting the right data product from the right provider. Many firms over-index on convenience – choosing what’s familiar from past experiences, a ‘one-stop shop’ provider, or the cheapest option. Others may opt for the premium branded option, thinking it must be the best quality.

While these approaches may work out fine, they are only part of the strategy that constitutes best practices. The most accurate and reliable insights, especially for critical business operations, require full breadth and depth of data coverage. Sourcing multiple products in the same data category is a best practice for full history, geographic and demographic coverage, and granularity. A robust selection framework will ensure decision-makers select optimal products for their firm’s workflows and insights needs.

A basic evaluation framework should consider attributes across data quality, vendor relationship, and data uniqueness categories. Data quality, which indicates reliability of the data for insights, is the most important attribute. Vendor relationships, which, when positive, can help offset potential shortcomings of the product. Data uniqueness is more of a front-office consideration around how much alpha potential the data has. A perfect partner with high scores across all three categories rarely exists and, if they do, only for a limited time.

Product Evaluation Criteria & Provider Personas

Pie chart of product evaluation criteria

Post-licensing, key feedback loops include measuring the data’s impact and usage and assigning attributed value back to the data catalog. The initial consumer group of the data product is likely to grow over time if the product is inherently useful.

Data licensing shouldn’t be daunting if the business objectives that inform data requirements are clear and the value of the data and insights to the business are measurable. In a fast-growing provider ecosystem, applying structure to decision-making improves timelines and ensures licensing decisions align with desired business outcomes.

Implement an Agile Integration Program

There’s often a long lead time between selecting the right data product and generating useful insights for an organization. In the best-case scenario, the time to value from a new dataset is three to six months, not including the licensing process.

Varying integration options, data schemas, delivery frequencies, and maintenance schedules make managing data pipelines across providers another core challenge. In addition to a flexible, modern data platform to accommodate data requirements, investing in scalable tooling and skilled resources maximizes data throughput across multiple data types for diverse business requirements.

What to consider in an agile integration program:

  • 1.From a tooling standpoint, low-code and no-code are here to stay. Self-service tools help put staged data in the hands of downstream users with fewer dependencies.
  • 2.Resource constraints persist due to a competitive market for a small pool of engineers with financial domain expertise. Firms are investing heavily into their engineering training and recruitment pipelines and increasingly leveraging staff augmentation programs.
  • 3.Firms also face a build or buy decision. Firms partnering with an external provider who has institutional knowledge and both tools and talent to integrate data for specific workflows onto a purpose-built platform have more operating leverage to focus in-house resources on the most differentiated proprietary data and technologies.

Setting the Stage for Meaningful Insights

Eighty percent of buy-side firms believe data budgets will rise over the next 12 months[3]. Defining a clear set of business objectives, staging internal data for consumption, implementing a robust evaluation framework, and designing an agile onboarding practice are all paramount to maximizing return on investment.

If we consider our baking example, many firms may find they already have some of the right ingredients. By leveraging staples and best practices, firms can set the stage for well-fed insights and analytics programs — arguably where most enterprise value creation happens at institutional investment firms.

[1] Market Data Spending Is on a Roll, Coalition of Greenwich, August 22, 2023

[2] Data Management Solutions Primer for 2022, Gartner, January 04, 2022

[3] Financial Market Data/Analysis: Global Share & Segment Sizing 2023, Burton-Taylor International Consulting, April 18, 2023

Share This Post

James CicaloVice President, Sales Operations & Enablement
Vera Shulgina Vice President of Product Management

Vera is Vice President of Product Management at Arcesium. She is responsible for the firm’s data strategy with a focus on driving value for Arcesium clients through data solutions and data partner integrations.

Subscribe Today

No spam. Just the latest releases and tips, interesting articles, and exclusive interviews in your inbox every week.