Investors optimize their portfolios with Apache® Spark™ on Databricks

This is another case study reinforcing the benefits of having the right tools for the right job, particularly when matched by a clear strategic vision of how to differentiate in a crowded market. In this case Omer Cedar, CEO of Omega Point identifies the advantages gained with a combination of the latest releases from Databricks and Apache Spark as they empowered the data science team to deliver real value to Omega Point’s customers. The data scientists were able to take advantage of the flexibility and speed of Apache Spark and Databricks to focus on measuring investment decisions in capital markets.

Summary

Omega Point enables financial and strategy professionals to make better investment decisions by unlocking insights from billions of data points with advanced analytics.
The release of their flagship product, the Omega Point Portfolio Intelligence Platform, was delayed due to the lack of reliable infrastructure and access to the most recent Apache@ SparkTM releases.
Databricks enabled Omega Point to accelerate the delivery of its product, increasing uptime of its production infrastructure while reducing maintenance costs.

Benefits

Higher productivity of the data science and engineering teams led to the faster delivery of core features by six months.
Improved production infrastructure uptime by 80%.
Reduced maintenance costs by 75%.

Business Background

Every day, financial and strategy professionals rely on data to help inform their portfolio and business decisions. When millions of dollars are on the line, the more insights they can glean from the data, the better. However, the reality is that the systematic process of generating insights — from the acquisition of the data, to the analysis, interaction, and consumption of insights — requires significant amounts of capital and talent, at levels only available to a small number of companies.

Omega Point’s vision is to enable finance and strategy professionals to benefit from the recent availability of such data, and to this end is building products that help them tap into largescale datasets. These include satellite imagery, web traffic, earnings transcript sentiment, and other observable data known as “economic exhaust” or “alternative datasets”. With Omega Point products, users discover valuable economic insights without having to build complex analytics infrastructure, hunt and purchase large numbers of individual datasets, and hire teams of data scientists to analyse them.

Their flagship product, the Omega Point Portfolio Intelligence Platform, enables investment managers to understand, visualize, and optimize their portfolios by uncovering the portfolio’s exposure to 50+ relevant market factors using cutting edge data science.

“As a fast-growing technology company, we are hyper-focused on solving the most challenging data problems and creating value for our customers. Databricks provides the advanced analytics capabilities to enable our teams to do just that.”

– Omer Cedar, CEO, Omega Point

Challenges

Extracting insights from “alternative datasets” requires cutting-edge data processing capabilities to infer valuable information from a multitude of seemingly low-value data. Omega Point selected Apache Spark because of its scalability and flexibility to tackle advanced analytics use cases, such as multi-kernel learning techniques. The initial approach to building the Omega Point Portfolio Intelligence Platform was a self-managed cloud-based PaaS to deploy Spark clusters, various open source tools for a data science notebook environment, and Jenkins to run production Spark jobs.

This approach proved to be limiting in a few ways:

Unreliable cluster performance. Simple tasks such as launching, scaling, and terminating Spark clusters were complicated and unreliable. Frequent restarts of the clusters were necessary to address the underlying instability of the system, which severely limited productivity.
Delayed access to the latest versions of Spark. After the release of a new Spark version, the Omega Point team had to wait several months because the legacy vendor could not incorporate the new version in its product. This limited the pace of application development because the developers did not have access to the critical functionalities of each new Spark release.
Siloed development environments. The data science and data engineering teams each had different preferences in languages and toolkits, which required them to have separate systems. When they needed to collaborate on an initiative they had. to move data between systems, which is slow and frequently resulted in data loss.

The infrastructure instability required Omega Point to devote three full-time data engineers on managing clusters and supporting data scientists. The limitations of the open source tools and Jenkins created a bottleneck that hampered the ability of Omega Point to deploy the Omega Point Portfolio Intelligence Platform.

Solution

Omega Pointed chose Databricks because it needed a production Spark platform that can fully support the needs of its data engineering and science teams, enabling Omega Point to focus on direct customer value-added development.

Databricks’ world-class product, Spark expertise, and quality of service has made them an indispensable partner in building a product with Apache Spark. “

– Omer Cedar, CEO, Omega Point

The Omega Point data engineering team deployed Databricks to power the backbone of its flagship product: The Omega Point Portfolio Intelligence Platform. Every day, Databricks pulls dozens of sources of data directly from Omega Point’s Amazon S3 account and through a sequence of over 80 production Spark jobs, and produces relevant indicators used to assess current economic and financial market trends in clients’ portfolios. This process employs fully automated job scheduling, monitoring, and cluster management without human intervention. Through its dashboard tool, Databricks also enables the team to debug the data pipeline more efficiently and find anomalies due to changes or revisions in the myriad of sources fed into the system each day.

To assess the quality of the raw sources and derived indicators in the context of the financial markets, the Omega Point data science team built a systematic learning environment in Databricks’ Integrated Workspace, which comes pre-installed with popular open source libraries such as numpy, pandas, scipy, sklearn, matplotlib in addition to Spark’s MLlib. The team was also able to easily customize Databricks to incorporate the SHOGUN machine learning library. To accelerate the iterative process of model development, the team utilized the rich built-in visualization capabilities to quickly summarize results, and leveraged matplotlib to draw more complicated figures, as both options are natively available in Databricks notebooks.

Benefits

Databricks sped up the release cycle, improved data throughput, and reduced the debugging time for the data engineering and science teams, enabling Omega Point to develop products with Apache Spark faster. Specifically:

Time to market of core features accelerated by over six months, fuelled by higher productivity.

Eliminated setup overhead. Databricks allows Omega Point to skip the overhead of setting up the work environment for new team members. New users can start producing useful work immediately, instead of installing various software packages or debugging their new environment.
Improved reproducibility of results. Reproducibility of results is a very important requirement for the data science team, as slight changes in the raw data, pre-processing steps, or learning algorithm parameterization may lead to variance in the outputs. Omega Point wants to be able to reproduce not only the code, but also the results from their machine learning project. Databricks’ revision feature and the ability to back up the code on GitHub fulfils this essential requirement.
Increased code sharing and transparency. Omega Point teams are distributed around the world, which had impeded productivity because they lacked a common platform for development. Databricks allows team members anywhere in the world to directly access work in progress, making it easy to decipher existing code to fix bugs or re-use the work to build new capabilities.

Resilient, highly available, and automated Spark infrastructure reduced deployment and maintenance costs by 75% and increased production uptime by more than 80%.

Freed the time of three data engineering FTEs to focus on development instead of infrastructure maintenance.
Faster failure detection and remediation because Databricks automatically monitors job status, recovers from failed jobs, and reports detailed debugging information

The Last Word

This case study highlights the benefits of a clear vision and understanding the latest releases of core applications that drive business value. It also highlights how the right application mix can be integrated to optimise the delivery of customer value.

At RoZetta this approach to integrated solutions is championed by Peter Spicer, our Chief Technology Officer. Peter has a strong background in solution design, development and implementation in many countries across AsiaPac. RoZetta can add value to your business with the right, cloud based, platform for managing and manipulating large volumes of data, in a timely manner.

Some of the key take-outs from this case study:

In highly volatile trading, like capital markets, time is of the essence when tracking, and measuring, the impact of investment decisions
Using the right mix of applications and platforms can release the data scientist resource pool to focus on the science of analysing and modelling outcomes, having confidence that the data is taken care of.

These outcomes align with RoZetta’s commitment to create value for its clients, and in turn, their client’s customers.

Key links from this case study: