Table Stream Change Data Capture
Table Stream is the Change Data Capture (CDC) mechanism of Singdata Lakehouse, used to capture change data of table objects. By defining a Table Stream object, you can record and track data changes based on an existing table.
Quick Guide: Is This What You Need?
If you are familiar with Kafka, you can think of Table Stream as a "consumer offset on a table" -- it records the last version you read, and the next query only returns rows that changed after that point. Unlike a Kafka consumer group, the offset only advances when you consume with DML statements; it will not lose data due to ordinary queries.
Quick Selection:
| Scenario | Recommended Solution |
|---|---|
| Capture row-level changes (INSERT/UPDATE/DELETE) on a table | Table Stream (this document) |
| Continuously import data from Kafka or OSS into a table | Pipe |
| Automatically maintain derived tables with aggregated or transformed results | Dynamic Table |
What Is Table Stream
Table Stream leverages the multi-version history feature of Lakehouse Table, recording the specified version (or the latest version) of the source table as the initial read position when created. When you query the Table Stream, it returns all change records from the initial offset to the current latest version.
Core Features:
- Table Stream does not store actual data; it only records and maintains the data version offset of the source table
- The offset is only updated to the latest data version when Table Stream is consumed using DML statements (INSERT/DELETE/UPDATE/MERGE)
- Based on the MVCC mechanism, no additional log storage or complex CDC configuration is required
Working Principle
Table Stream Types
| Type | Tracking Scope | Applicable Scenarios |
|---|---|---|
| STANDARD | All DML changes (INSERT, UPDATE, DELETE, TRUNCATE) | ETL scenarios requiring complete change data capture |
| APPEND_ONLY | INSERT operations only | Log-type, event-type append-only data scenarios |
STANDARD Type Details
The STANDARD type provides row-level changes by joining and processing all delta data changes to provide row-level increments. Delta changes reflect the latest state of the source object, not historical changes.
For example:
- If a row is inserted and then updated after the Table Stream's offset, the delta change is the row after the update
- If a row is inserted and then deleted after the Table Stream's offset, the delta change is no such row
APPEND_ONLY Type Details
The APPEND_ONLY type only records INSERT operation data. Update and delete operations are not recorded.
For example: 10 rows are initially inserted into the table, then a delete operation removes 5 rows while the offset remains unmoved. The Table Stream still records all 10 rows.
How to Determine Change Types
In STANDARD mode, the __change_type field has four values, regardless of the SHOW_INITIAL_ROWS parameter:
__change_type | Meaning | Description |
|---|---|---|
INSERT | New row | An INSERT was performed on the source table |
UPDATE_BEFORE | Old value before update | Appears paired with UPDATE_AFTER; __commit_version is the old version number |
UPDATE_AFTER | New value after update | Appears paired with UPDATE_BEFORE; __commit_version is the new version number |
DELETE | Deleted row | A DELETE was performed on the source table; retains the field values of the deleted row |
An UPDATE operation produces two rows: UPDATE_BEFORE (old value) and UPDATE_AFTER (new value). Both rows have the same id but different __commit_version.
STANDARD Mode Example:
In APPEND_ONLY mode, __change_type is always INSERT. UPDATE and DELETE operations produce no records.
Actual Effect of the SHOW_INITIAL_ROWS Parameter
SHOW_INITIAL_ROWS controls whether existing data in the table at the time of Stream creation is visible. It does not affect the value of __change_type:
SHOW_INITIAL_ROWS | Behavior |
|---|---|
'FALSE' (default) | Data already in the table at Stream creation is invisible; only changes that occur after Stream creation are captured |
'TRUE' | Data already in the table at Stream creation is exposed as INSERT during the first consumption. After consuming the initial snapshot, subsequent changes produce UPDATE_BEFORE/UPDATE_AFTER/DELETE normally |
Standard pattern for consuming a STANDARD-mode Stream in a MERGE statement:
Using Table Stream
Prerequisites
Table Stream can be created directly on any regular table; no additional configuration is required.
Creating a Table Stream
Table Stream can be created directly on any regular table:
Querying a Table Stream
Consuming a Table Stream
When consuming a Table Stream using DML statements, the offset automatically updates:
Deleting a Table Stream
Data Change Visibility Timeliness
| Data Write Method | Table Stream Visibility Timeliness |
|---|---|
| DML (INSERT/UPDATE/DELETE) | Immediately visible after task completes successfully |
| Bulkload | Immediately visible after task completes successfully |
| Streaming Ingestion (Ingestion Service) | Visible after changes are committed, by default every 1 minute |
Relationship Between Table Stream and Dynamic Table
| Dimension | Table Stream | Dynamic Table |
|---|---|---|
| Positioning | Low-level CDC mechanism that captures table change data | High-level data processing feature that performs data transformation based on incremental computation |
| Stores Data | No, only records the offset | Yes, stores computation results |
| Performs Computation | No, only returns change records | Yes, executes defined SQL logic |
| Refresh Method | Returns changes on query; updates offset on consumption | Automatically refreshes at configured intervals |
| Typical Scenarios | Change data capture, real-time data synchronization | Data warehouse layered processing, metric aggregation |
Collaborative Working Mode
In practice, Table Stream and Dynamic Table often work together:
- ETL Data Processing Chain: Create a Table Stream on the source table to capture change data, then use a Dynamic Table to perform transformation and aggregation on these changes
- Multi-level Incremental Computation: Build a series of Dynamic Tables, where each Dynamic Table processes the incremental data produced by the previous table in sequence
- Real-time Data Analysis: Use Table Stream to capture business system data changes in real time, then use Dynamic Table for real-time analysis and computation
Best Practices
1. Create Directly
Table Stream can be created directly on any regular table; no additional configuration is required:
2. Choose the Right Type
- If you only need to capture newly added data (e.g., log tables, event tables), use APPEND_ONLY type for better performance
- If you need complete change data (including updates and deletes), use STANDARD type
3. Consume Promptly
Table Stream's change data depends on the source table's Time Travel retention period. If not consumed for an extended period, the source table's historical versions may be cleaned up, causing the Table Stream to be unable to obtain complete change data.
4. Monitor Offset Status
Use DESC TABLE STREAM to check the current offset status of a Table Stream:
stale = truemeans the source table's historical versions have been cleaned up, and the Table Stream may not be able to obtain complete changesoffsetof -1 means no data has been consumed yet
Notes
- Table Stream can be created directly on any regular table; no additional configuration is required
- When the source table is deleted, the associated Table Stream becomes invalid
- Table Stream offset updates only occur on DML consumption; queries alone do not update the offset
- In STANDARD mode, UPDATE produces
UPDATE_BEFORE+UPDATE_AFTERtwo rows, and DELETE produces aDELETErecord, regardless of theSHOW_INITIAL_ROWSparameter CREATE OR REPLACE TABLEmakes the associated Stream stale: ExecutingCREATE OR REPLACE TABLE(table rebuild) on the source table clears the table's historical versions, causing all Table Streams based on that table to immediately become stale (stale = true) and unable to obtain complete change data. If you need to modify the table structure, prefer usingALTER TABLEinstead ofCREATE OR REPLACE TABLE.
