Data Transformation
Singdata Lakehouse covers four data transformation patterns: auto-incremental refresh pipelines (Dynamic Table), scheduled batch ETL (Studio task scheduling), change data capture (Table Stream), and query acceleration (Materialized View). Choose based on your latency requirements and trigger model.
I want to build a data pipeline (ODS → DWD → ADS)
Recommended: Dynamic Table, define a SQL query and the system automatically computes incremental updates and maintains the result — no manual scheduling required.
| Scenario | Approach | Reference |
|---|---|---|
| Multi-layer pipeline with automatic incremental refresh | Dynamic Table | Dynamic Table Overview · Create Dynamic Table |
| End-to-end real-time ETL example | Dynamic Table tutorial | Real-time ETL with Dynamic Tables |
| Near-real-time incremental processing pipeline | Dynamic Table + Table Stream | Build a Near-Real-Time Incremental Pipeline with Dynamic Tables |
| CDC data processing (real-time database change processing) | Multi-table real-time sync + Dynamic Table | Complete CDC and Data Processing Example |
| Implement SCD (Slowly Changing Dimensions) | Table Stream + task | SCD Implementation Guide |
I want to run scheduled batch ETL (T+1 / hourly)
Recommended: Studio SQL task + scheduling, write SQL in Studio, configure a Cron schedule, and monitor run status visually.
| Scenario | Approach | Reference |
|---|---|---|
| SQL transformation task + periodic scheduling | Studio SQL task | Task Development and Scheduling · Quick ETL Setup |
| Multi-task orchestration (with dependencies) | Composite task / task group | Composite Task · Task Group |
| Python processing required (pandas / custom logic) | Studio Python task | Python Task Development |
| Data modeling with dbt | dbt + Lakehouse | Incremental Development with dbt on Lakehouse |
I want to detect table data changes (CDC / incremental-driven)
Recommended: Table Stream, captures INSERT / UPDATE / DELETE changes on a table to drive downstream incremental processing.
| Scenario | Approach | Reference |
|---|---|---|
| Capture table changes to drive downstream processing | Table Stream | Table Stream Overview · Create Table Stream |
| Table Stream best practices | — | Table Stream Best Practices |
I want to accelerate queries (pre-computation / cached results)
Recommended: Materialized View, pre-computes and stores query results with automatic query rewriting for transparent acceleration.
| Scenario | Approach | Reference |
|---|---|---|
| Pre-compute high-frequency complex queries | Materialized View | Materialized View Overview · Create Materialized View |
| Query rewriting (transparent acceleration) | Materialized View + query rewriting | Query Rewriting |
I want to do SQL data transformation (cleansing / aggregation / joining)
| Scenario | Reference |
|---|---|
| SQL transformation basics | SQL Data Transformation Basics |
| Window functions (YoY / MoM / ranking) | Data Transformation with Window Functions |
| Complex queries with CTEs | Data Transformation with CTEs |
| Nested data types (Array / Map / Struct) | Nested Data Type Transformation |
| JSON data processing | JSON Processing Guide for Complex Business Cases |
| Practical tips | SQL Transformation Tips |
| Funnel analysis and user behavior | Funnel Analysis Guide |
| Session analysis (Sessionization) | Session Analysis Guide |
| Retention and cohort analysis | Retention and Cohort Analysis Guide |
| Marketing attribution analysis | Attribution Analysis Guide |
| Hierarchical queries (org charts / BOM) | Hierarchical Query Workaround |
| Data deduplication | Data Deduplication Guide |
| Data pivoting (rows to columns / columns to rows) | Data Pivot and Transpose Guide |
| Cumulative calculations and running totals | Running Total Guide |
I want to ensure data quality
Recommended: Studio Data Quality Rules (DQC), configure validation rules to automatically intercept anomalous data before processing.
| Scenario | Approach | Reference |
|---|---|---|
| Configure data quality check rules | Studio DQC | Quick Data Quality Rule Setup |
| Complete data quality guide | DQC rule configuration | Data Quality |
I want to monitor pipeline run status
Recommended: Studio Operations Monitoring, visually view task run status, logs, and alerts.
| Scenario | Approach | Reference |
|---|---|---|
| View task run logs | Studio Operations Center | Quick Monitoring and Alerting Setup |
| Troubleshoot task failures | Job Profile diagnostics | Job History Analysis |
| DataOps production practices | Complete operations guide | DataOps Data Safety and Stability Practices |
Not sure which tool to use?
For a complete tool selection guide, see: Real-Time Pipeline Selection Guide
