Member-only story

Change Data Capture(CDC) With DB

Suraj Mishra
3 min readJul 18, 2020

--

In this blog we will see how to capture delete operations from database when we build data pipeline.

https://unsplash.com/photos/9AxFJaNySB8

Introduction

When we build a data pipeline the first step in the process is to identify the data source and how to extract that data in order to process.

During data extraction INSERT, UPDATE, SELECT queries can be performed without any concern but DELETE operation might be something we should think twice. DELETE operation might be hard delete and once data is deleted from the source we cannot capture it in data pipeline.

In this article, I will explain 3 ways to capture DELETE operation during the data extraction process.

# 1: Common Solution

If the source table in the database has update_timestamp then we can query the database based on what has changed since your last job run.

SELECT * FROM DB.table WHERE update_timestamp>last_run_time

In this case, you can extract soft delete(generally with _isdeleted=true flag) operation but we cannot capture hard delete operation.

Also, if update_timestamp is not indexed, running the above query can cause performance issues as well.

#2: Trigger Table

--

--

Suraj Mishra
Suraj Mishra

Written by Suraj Mishra

Staff Software Engineer @PayPal ( All opinions are my own and not of my employer )

No responses yet