In this post we cover the built-in SQL Server CDC functionality, how to set it up, and some pros and cons of this feature. We also introduce Striim’s high-speed, reliable SQL Server CDC solution.
What is Change Data Capture?
Change Data Capture (CDC) is a software process that identifies and tracks changes in a database. There are various CDC methods, each with their pros and cons, but the main benefit of CDC is that it enables near-real-time data movement from databases. Continuous data capture is important for both analytics use cases (where timely data is used to make decisions) and zero-downtime cloud migrations (where operational databases can’t be taken offline for extended periods).
What is SQL Server?
Microsoft SQL Server is a relational database management system (RDMS) developed by Microsoft. SQL Server is mainly used to store and retrieve data. For the purposes of analysis and business intelligence, data is typically replicated to a data warehouse optimized for large analytics queries.
Change Data Capture is an ideal solution for replication of SQL Server data in near real-time. In this post, we’ll share an overview of SQL Server’s native CDC functionality and compare it to Striim’s high-speed SQL Server CDC reader.
SQL Server CDC Methods
Native SQL Server CDC
SQL Server offers a native CDC functionality, as depicted in the diagram below.
Image source: Microsoft documentation
The source table contains the data that’s being inserted, updated, and deleted by applications. The SQL Server transaction log keeps a record of the changes. If CDC is enabled, a SQL Server agent reads inserts, updates, and deletes from the transaction log. These changes are added to a separate change table, where query functions are used to capture the changes so they can be delivered to a target data warehouse or other destination via a data integration process like ETL.
What are the benefits and drawbacks of the native SQL Server Change Data Capture feature?
Built-in feature, no external tools needed
Uses familiar T-SQL language
Adding a change table adds overhead to the source database
Querying change tables adds latency
Change tables need to be cleaned up on a regular basis
Since a change table is created for each CDC-enabled table, it can be challenging to consolidate changes from multiple tables (and databases)
Tutorial: How to set up the native SQL Server CDC functionality
This tutorial describes how to set up change data capture (CDC) in SQL Server using Microsoft’s AdventureWorks sample database to illustrate the steps.
For additional information see SQL Server setup in the documentation.
The following are required to complete this tutorial:
Existing SQL Server running instance
SQL Server management tools
(Optional) AdventureWorks sample database
Set up Database-Level CDC
CDC must be enabled both at the database and table level in SQL Server. Database-level CDC functionality is enabled by setting is_cdc_enabled to 1 in the sys.databases table.
Invoke the following commands to check the current state of database-level CDC functionality:
Locate your database (e.g., AdventureWorks2019) in the resultset. If is_cdc_enabled is 0 as shown below, then CDC is not currently enabled at the database level:
Invoke the following commands to enable database-level CDC. These commands use SQL Server’s built-in sys.sp_cdc_enabled_dbd stored procedure to perform the necessary updates:
Re-invoke the verification commands from the start of this section to verify that is_cdc_enabled has is now set to 1:
Set up Table-Level CDC
In addition to setting up database-level CDC, you must also enable CDC on a per-table basis.
This section describes how to set up and verify table-level CDC on the AdventureWorks database’s DimAccount table.
Invoke the following commands to set up CDC on the DimAccount table. Note the source_schema parameter must be assigned the name of the schema to which the table (DimAccount) belongs, in this case dbo:
@source_schema = N’dbo’,
@source_name = N’DimAccount’,
@role_name = NULL
Verify that CDC is enabled for the DimAccount table by checking that is_tracked_by_cdc is set to 1 for the database in sys.tables.
Invoke the following command to return all tables listed in sys.tables:
SCHEMA_NAME(Schema_id) As [Schema Name],
name As [Table Name],
Locate the row for the DimAccount table and verify that is_tracked_by_cdc is set to 1:
This process results in the creation of a system table prefixed with cdc. called cdc.dbo_DimAccount_CT that can be seen in Object Explorer and further verified by querying the table.
Invoke the following command to verify that cdc.Dbo_DimAccount_CT can be queried and that a row is returned:
SELECT * from cdc.Dbo_DimAccount_CT
You have now successfully configured CDC for a database and table.
Striim for High Speed, Reliable SQL Server CDC
Striim is a real-time data integration platform that natively performs change data capture by mining database change logs while managing additional metadata. Striim supports enterprise workloads with:
high-speed capture (more than 7x faster than the native SQL Server CDC feature)
exactly-once processing (E1) to prevent data loss or duplication
support for long-running transactions
Striim provides reliable, high-speed SQL Server CDC
As shown in the diagram above, Striim’s SQL Server CDC Reader doesn’t require the use of a change table. Striim reads the change data from the SQL Server transaction logs; performs in-memory transformations, enrichment and analysis; and delivers data to target systems…with millisecond latencies.
Striim enables users to migrate and replicate data from SQL Server databases to cloud and on-premise data warehouses, data lakes, messaging systems, files, and big data targets.