Debezium Eco-system Tutorial (Part 2)

Introduction

This is in continuation of the earlier part 1 blog. Part 2 of this blog has all the tutorial details on how to create a streaming CDC (Change Data Capture) data using Debezium and Apache Kafka.

This blog describes the prerequisites, steps for installation and how to configure the components to implement a streaming CDC data from PostgreSQL RDBMS to Apache Kafka and then any consumer can utilize the CDC data.

Past Context

Part 1 of the blog describes the eco system for the Debezium and advantages of using Debezium as a CDC software component along with the Apache Kafka.

Prerequisites of Software Components

The following are the prerequisites for this blog, if you want to follow the steps for this developmental cluster. In this tutorial, I have used Windows Operating System, but you can easily follow the same steps for the MacOS and other linux OS variants.

Following are the prerequisite components:

  1. Download Debezium PostgreSQL connector – see the link at the end of this tutorial.
  2. Install PostgreSQL RDBMS – if not already installed
  3. Download the Apache Kafka from web site – Download Kafka

List of Commands are as follows:

This creates the database inside the PostgreSQL RDBMS:CREATE database exampledatabase;
This will change the current database to example database:\c exampledatabase
This shows all the JSON CDC payloads for the table account.
bin\windows\kafka-console-consumer.bat –topic dbserver1.public.account –from-beginning –bootstrap-server localhost:9092
This shows all the JSON CDC payloads for the table transactions.
bin\windows\kafka-console-consumer.bat –topic dbserver1.public.transactions –from-beginning –bootstrap-server localhost:9092
This shows all the list of topics inside the Kafka cluster.
bin\windows\kafka-topics.bat –bootstrap-server localhost:9092 –list

Installation Steps

The link to the Github README.md file, which has all the details.

README.md file location

Test the running System

You can see the CRUD operations happening inside the RDBMS as the JSON payloads

in the Apache Kafka consumer window.

There are many Kafka consumers which can be used to sink the JSON payload data generated from the cluster. All CDC data of the RDBMS can be consumed by the Kafka consumers.

Sample JSON CDC Data Output  

 “payload”: {    
“before”: null,    
“after”: {      
“id”: 1,     
  “full_name”: “John Doe”,    
  “date_of_birth”: 5644,    
  “address1”: “123 Main St, Apt 4B”,    
  “pin_code”: “10001”,    
  “created_on”: 1732796365350222,     
  “updated_on”: 1732796365350222   
 },
}

Configuration Items

The configuration items are available at the GitHub at this link:

Debezium Related all the config files

Conclusion

The Debezium can stream the CDC data from the RDBMS and stream to the Kafka consumer. There can be multiple Kafka consumers which can consume the CDC data for ex, BI and Analytics tools, other event driven architecture microservices for reports and dashboards.

External References

  1. Debezium connector for PostgreSQL
  2. Debezium Documentation
  3. Debezium Part 1 Blog
  4. Download Kafka

Debezium Part 3 to follow

The final part of the blog is now taking this setup to the AWS Cloud. This will help to stream the CDC data from the AWS RDS service RDBMS to Kafka and the consumers.


Comments

Leave a Reply

Your email address will not be published. Required fields are marked *