News

Big Data Analysis Projects Scale Six-Fold

Celtra position themselves in the market as providing:

CREATIVE MANAGEMENT PLATFORM
Cloud-based, self-service creative technology used to measurably improve the effectiveness of digital advertising.


Jaka Janear, Chief Technology Officer at Celtra, explains how the combination of Databricks and Apache Spark delivered the scale needed to effectively manage, and process, the exponential increase in data points created in digital advertising. This data was the currency used to create value for their customers. In addition, the solution also delivered a more cost-effective solution for the business.

The new environment also resulted in productivity increases, particularly in the initial stages of investigating new products and services.


All these business benefits came from integrating the right applications on the right platform using the most effective data management tools.

Summary

  •  Celtra relies on data analytics to inform product design, troubleshoot anomalies, and fine-tune the performance of its display advertising software platform capabilities.
  •  Celtra encountered difficulties in meeting the rising demand for data analysis due to the large scale of the data, diversity of data sources, and small size of the analytics team.
  •  Celtra selected Databricks as their data processing platform; enabling teams from Engineering, Product Management, and QA to directly work with data and perform the required analysis.

Benefits

  •  Increased the amount of ad-hoc analysis done six-fold, leading to better informed product design and quicker issue detection and resolution.
  •  Reduced the load on the analytics engineering team by expanding access to the number of people able to work with the data directly by a factor of four.
  •  Increased collaboration and improved reproducibility and repeatability of analyses.
  •  Reduced the cost of cloud infrastructure through faster and easier management of Apache@SparkTM clusters.

Business Background

Celtra provides agencies, media suppliers and brand leaders alike with an integrated, scalable HTML5 technology for brand advertising on smartphones, tablets, and desktop.

The platform, AdCreator 4, gives clients such as MEC, Kargo, Pepsi and Macy’s the ability to easily create, manage, and traffic sophisticated data-driven dynamic ads, optimize them on the go and track their performance with insightful analytics.

A wide variety of data is collected by Celtra, including data related to internal company processes, data based on the usage of the product by clients and, most importantly, data focused on the engagements of consumers with their clients' ads. In addition to providing analytics to its clients,

Celtra is constantly exploring new ways to leverage this gathered information to improve their offering, for example:

  • Product usage analysis: Analysing feature adoption, usage patterns, and support cases to direct further development focus.
  •  Environment analysis: Assessing the feasibility of new product concepts and detecting trends by analyzing the context in which Celtra's ads run, such as the publisher and device of choice.
  •  Technical performance: Monitoring load times of ads closely across multiple dimensions i.e. ad complexity, geography, connectivity, and CDNs. Most recently, Celtra has been evaluating the performance benefits of SPDY and HTTP/2 for improved page load times.
  •  Quality Control: Computing key performance metrics to detect issues at deployments, enabling the automatic detection of anomalies to detect regressions that would otherwise get lost in the averages.

Challenge

As Celtra’s business grew, it was challenged to meet the corresponding increase in demand for analytics due to three reasons:

  1. Diversity of data sources: The production and engineering data from Celtra's systems are scattered in different locations. Celtra did not have an easy way to combine the data from these disparate data sources and perform the necessary analysis in a single analytics platform.
  2. Large scale of data: Celtra’s production systems generate tens of terabytes data monthly. While Celtra has been using Spark as its data processing platform since its early days and accumulating lots of expertise, this knowledge was limited to the team working on the analytics portion of the product.
  3. Small analytics team: The analytics team consisted of only four people, who quickly became the bottleneck to service requests from Product Management, Engineering, and QA.

To overcome these challenges, Celtra needed a powerful data platform that was capable of integrating data from disparate data sources while being fast enough to support interactive analysis at terabyte scale. This platform must also be user-friendly enough to empower teams outside of analytics to perform analysis themselves, and to remove the bottleneck created by their small analytics team.

Solution

Celtra adopted Databricks as their centralised analytics platform because the key features in Databricks could easily address all Celtra’s needs.  Apache Spark is an open source big data processing framework built for speed and scale. Databricks made Spark much easier to deploy by combining the power of Spark with a zero-management hosted platform on Amazon Web Services (AWS), allowing Celtra to take advantage of Spark without the DevOps burdens typically associated with big data infrastructure.

  • Seamless connection to diverse data sources: Databricks provided built-in APIs to access data from AWS S3 and relational databases. Since the full power of Scala is available in Databricks, data from various web service APIs could be accessed as well. Celtra could seamlessly connect its data by consolidating the disparate sources in Databricks.
  • Reuse of production code in ad-hoc analyses: Since Databricks is based on Apache Spark, similar to Celtra’s production analytics pipeline, a lot of production code could be reused as the foundation for ad-hoc analyses instead of rewriting code in another framework.
  • User-friendly interactive workspace: Databricks included an intuitive, multiuser interactive workspace for real-time analysis and visualization, enabling teams other than analytics, to work with data directly in a single, easy to use environment.

With the adoption of Databricks, Celtra has enabled teams from Engineering, Product Management, and QA to perform complex data analysis on their own, leveraging the massive production data to improve product design, address anomalies rapidly, and finetune the performance of production systems.

Benefits

The most important benefit Celtra gained from deploying Databricks, is the ability to remove the bottleneck within its analytics team to meet the surging demand for big data analysis across the company. Since its introduction, Databricks has been broadly utilized by over a third of the technical staff in Engineering, QA and Product Management. As a result of empowering them to work with the data directly, many more questions have been asked and hypotheses tested, leading to better informed product design and quicker issue detection and resolution. Celtra has increased the amount of analyses done and insights obtained by six-times in the first four months after adopting Databricks alone and increased the number of people working with our most valuable data by fourfold.


“The notebooks feature in Databricks encourages good documentation by automatically recording the code written during an ad hoc analysis session. This has had profound effects for us, from increasing collaboration and improving reproducibility to making analysis more approachable to a wider audience, who can start off by cloning someone else’s research. “
– Jaka Janear, Chief Technology Officer at Celtra


Aside from dramatically boosting the amount of analytics done, Celtra also experienced two additional benefits from using Databricks:

  1. lmproved collaboration and reproducibility:

    The self-documenting nature of notebooks in Databricks meant that ad-hoc analysis code was automatically stored in a centralized location. This feature encouraged teams to leverage the existing codebase instead of duplicating past efforts in writing new code, eventually leading to a maintainable collective codebase for ad-hoc analysis. Additionally, by having all work stored by default, past results could be easily reproduced in cases where additional insight was needed.

  2. Reduced cloud infrastructure cost: The faster and easier provisioning, resizing, and deprovisioning of Spark clusters made Celtra engineers more comfortable with shutting down unused clusters whenever possible. Agility in cluster management also facilitated the use of Spot Instances by making its use less risky. When combined with the “Jobs” feature of Databricks, Celtra was able to substantially reduce the cost of its cloud infrastructure by scheduling long-running jobs that automatically provision and deprovision clusters as needed.

    “Databricks is used by over a third of our technical staff — from engineering to product management — to help us make smart, data-driven decisions; After implementation, the amount of analysis performed has increased sixfold, meaning more questions are being asked, more hypotheses tested. “
    – Jaka Janear, Chief Technology Officer at Celtra

The Last Word

There are certainly a lot of challenges when big data becomes an essential element of a traditional business.  Deciding how to cope with an enormous increase in data, and then turning this new data into new products and services can be daunting.  Celtra found a solution in combining the power, efficiency and flexibility of Databricks when integrated with Apache Spark.

At RoZetta Peter Spicer, our Chief Technology Officer, has the knowledge and experience to help you develop an equally effective solution.  

Peter has a strong background in solution design, development and implementation in many countries across AsiaPac.  RoZetta can add value to your business with the right, cloud based, platform for managing and manipulating large volumes of data, in a timely manner.

Some of the key take-outs from this case study:

  • The incredible increase in volume of data available in all markets and industries needs special consideration when deciding how to use it to the benefit of your customers.  
  • Using the right mix of applications and platforms can result on more cost-effective infrastructure costs, but also to pro

These outcomes align with RoZetta’s commitment to create value for its clients, so that they, in turn, can create value their customers.  

Key links from this case study: