Introduction
This is in continuation of the earlier part 1 blog. Part 2 of this blog has all the tutorial details on how to create a streaming CDC (Change Data Capture) data using Debezium and Apache Kafka.
This blog describes the prerequisites, steps for installation and how to configure the components to implement a streaming CDC data from PostgreSQL RDBMS to Apache Kafka and then any consumer can utilize the CDC data.
Past Context
Part 1 of the blog describes the eco system for the Debezium and advantages of using Debezium as a CDC software component along with the Apache Kafka.
Prerequisites of Software Components
The following are the prerequisites for this blog, if you want to follow the steps for this developmental cluster. In this tutorial, I have used Windows Operating System, but you can easily follow the same steps for the MacOS and other linux OS variants.
Following are the prerequisite components:
- Download Debezium PostgreSQL connector – see the link at the end of this tutorial.
- Install PostgreSQL RDBMS – if not already installed
- Download the Apache Kafka from web site – Download Kafka
List of Commands are as follows:
This creates the database inside the PostgreSQL RDBMS:CREATE database exampledatabase; |
This will change the current database to example database:\c exampledatabase |
This shows all the JSON CDC payloads for the table account. bin\windows\kafka-console-consumer.bat –topic dbserver1.public.account –from-beginning –bootstrap-server localhost:9092 |
This shows all the JSON CDC payloads for the table transactions. bin\windows\kafka-console-consumer.bat –topic dbserver1.public.transactions –from-beginning –bootstrap-server localhost:9092 |
This shows all the list of topics inside the Kafka cluster. bin\windows\kafka-topics.bat –bootstrap-server localhost:9092 –list |
Installation Steps
The link to the Github README.md file, which has all the details.
Test the running System
You can see the CRUD operations happening inside the RDBMS as the JSON payloads
in the Apache Kafka consumer window.
There are many Kafka consumers which can be used to sink the JSON payload data generated from the cluster. All CDC data of the RDBMS can be consumed by the Kafka consumers.
Sample JSON CDC Data Output
“payload”: { “before”: null, “after”: { “id”: 1, “full_name”: “John Doe”, “date_of_birth”: 5644, “address1”: “123 Main St, Apt 4B”, “pin_code”: “10001”, “created_on”: 1732796365350222, “updated_on”: 1732796365350222 }, } |
Configuration Items
The configuration items are available at the GitHub at this link:
Debezium Related all the config files
Conclusion
The Debezium can stream the CDC data from the RDBMS and stream to the Kafka consumer. There can be multiple Kafka consumers which can consume the CDC data for ex, BI and Analytics tools, other event driven architecture microservices for reports and dashboards.
External References
Debezium Part 3 to follow
The final part of the blog is now taking this setup to the AWS Cloud. This will help to stream the CDC data from the AWS RDS service RDBMS to Kafka and the consumers.
Leave a Reply