Case Study: Large Online Retail Customer – Change Data Capture (CDC) & Database Replication Across Cloud Platforms

By Varun Kumar | @intelia | April 18

Using the latest technology in streaming data pipelines intelia implemented a real-time, fully-managed and scalable solution that ingests CDC changes from OLTP databases in AWS to an analytics warehouse in GCP (BigQuery) to improve business reporting, performance and maintain history.

The challenge

The Customer was facing a challenge in business reporting where critical reports were out of sync and in some cases was taking too long to load causing operational issues. The existing OLTP database on AWS was unable to handle the rising demands of real-time reports and ML processes.

Additionally, due to frequent schema changes, there were issues with the reliability and quality of data between the data sources and the data consumers.

A fully managed solution that enabled data contracts was needed.

The solution

Separating the databases used for reporting operations from the transactions while utilising the power of BigQuery along with Pub/Sub to enforce schemas was the solution proposed.

intelia used Pub/Sub BigQuery subscriptions with topic schemas to ingest CDC data in real-time. The Pub/Sub solution also enforced schemas in the topics to maintain a data contract between the source systems and its consumers downstream. A process to evolve the schemas in production without any downtime was also implemented using Cloud Functions.

A process to ingest backfill data in batch was also designed using AWS DMS, storage transfer service and Cloud Functions.

The results

The customer now has a fully managed, scalable data pipeline that replicates data across clouds with millisecond latencies to power reporting dashboards and other analytical workloads while providing data lineage.

Data contracts are also in place which will help the customer move towards a data mesh approach – domain-driven architecture, self-serve data platforms and federated governance.

About the customer

The customer is a Sydney-based, Australian online fashion and sports retailer.

Industry: Retail & Wholesale

Primary project location: Australia