Operational Data Stores for Analytics Workloads
By Sandip Chaturvedi & Varun Kumar | @intelia | December 2
Context
Since its arrival as a concept, an Operational Data Store aka ODS has existed in multiple shapes and forms across organisations addressing different business problems. Here we will review how Analytical Operational Data solutions can help businesses in reducing the time to answer when asking questions of your data and empower the business with new set of tools and technologies to innovate. We will also specifically highlight a use-case we solved for our one of our customers using an ODS.
Business needs, challenges and constraints
In the context of Data & Analytics, an operational data store lives between front-line application and enterprise data warehouse. The front-line applications operational data is ingested and integrated from multiple sources in near-real-time in an ODS. An ODS is focused on operational side of business intelligence, analytics and near-real-time analysis.
Generally, the requirement for an ODS is driven by the following key reasons:
- The constraints enforced by the rigid and complex nature of front-line enterprise applications to limit the level of reporting and analytics it can support
- The volume of the data is too high – and, this adds to multiple challenges like addressing the data quality, consistency, broken application processes, performance, data frequency, etc.
- Delay in the availability of actionable insights due to operational activities taking priority over others
- Requirement of handling a variety of data, history, heavy transformations, and aggregations
- Cross-business unit operational & analytics need
While it is very important to identify use cases for an ODS within the Enterprise, it is just as important to make sure that ODS isn’t doing the job of front-line enterprise application.
Recently, we helped one of our customers overcome challenges with their key business reporting by developing an operational data store for data & analytical workloads in Google Cloud Platform (GCP).
The key business issues they were facing included:
- Delay in actionable information – Reports were updated only once every 24 hrs via a batch process, so there was always a delay
- Suboptimal performance – Some reports were taking too long to load and the underlying database was unable to handle the queries load and was impacting day-to-day operations.
- Missing historical perspective – There was no historical view of data including the changes that occurred to the respective datasets which were needed for business analysis and compliance purposes
- Missing unified view – They were unable to combine data from other databases to get a unified view of the business operations
Our Solution Features
At high level solution perspective, we used Google Cloud Platform (GCP) services to:
- Capture application changes at database-level, from source databases, using CDC (change data capture) service Debezium and,
- Streamed those changes into Bigquery using GCP Pub/Sub and DataFlow.
Benefits
With this solution approach, we were able to achieve following benefits for the business:
- Improved data availability – The data available in Bigquery is now near-real-time.
- High performance – The business reports used Bigquery as a source for reporting and Bigquery’s exceptional scalability and performance helped meet the performance requirement.
- Reduced overloads – The load on the respective Application’s (OLTP) databases have reduced significantly as the business reporting are now done from Bigquery.
Conclusion
We were successful in solving the business problem by providing near real time insights on production data without impacting the production applications themselves using Google Cloud Platform.
Please feel free to reach out to provide your comments or if you have similar kind of challenge or want to chat about where an ODS could fit in your modern data stack.