Create Dynamic Table (Dynamic Table)
A Dynamic Table is like an "auto-updating formula sheet": you only need to define the SQL computation logic, and whenever the source table has new data, the system automatically updates the results incrementally. When querying, you read the pre-computed results directly without re-running the full computation or manually configuring scheduling tasks.
Selection reference:
| Scenario | Recommended Solution |
|---|---|
| Aggregation or transformation results that need periodic refresh and must be queryable at any time | Dynamic Table |
| Need to capture row-level changes (INSERT / UPDATE / DELETE) from source tables | Table Stream |
| Need to continuously import raw data from Kafka or object storage | Pipe |
| Complex scheduling logic, dependencies on multiple upstream/downstream tasks, or non-SQL processing | Studio Task |
Feature Overview
A Dynamic Table is a data object that automatically refreshes incrementally based on a query definition. The data processing logic is defined via SQL query at creation time, and during refresh, incremental data from source tables is automatically retrieved and computed using incremental algorithms.
For more usage methods, refer to Dynamic Table Introduction and Incremental Computation Principles.
Creation Syntax
Simplified Syntax (Recommended)
Examples:
Full Syntax
Parameter Reference
Required Parameters
dtname: Dynamic table name, supportsschema_name.table_nameformat for cross-schema creation.AS query: Defines the computation logic of the dynamic table; results are computed based on this query at each refresh.
Optional Parameters
-
IF NOT EXISTS: Skip creation without error — use when a dynamic table with the same name already exists; cannot be specified together withOR REPLACE. -
OR REPLACE: Update the table definition in place, preserving existing data and permissions — use when modifying column structure or SQL logic to avoid data loss from dropping and recreating the table.Whether the next REFRESH after
OR REPLACEis incremental or full depends on the type of change:
| Change Type | Next Refresh Mode |
|---|---|
| Only adding/removing pass-through columns (no change to computation logic) | Incremental |
| Modifying WHERE conditions, JOIN keys, or GROUP BY keys | Full |
Adding columns involved in computation (e.g., j * 1) | Full |
-
column_list: Specify names or comments for output columns — required when SELECT contains expressions (e.g.,j + 1), otherwise column names are unpredictable. Column types are automatically inferred from the SELECT result ofAS queryand cannot be specified here; useCASTin SELECT for explicit type control. -
PARTITIONED BY (col): Partition result data by the specified column — suitable for large datasets where queries frequently filter by a column (e.g., by date or region), significantly reducing scan range. -
CLUSTERED BY (col): Hash-bucket result data by the specified column — suitable for large datasets where queries frequently JOIN or aggregate by a column, reducing data skew. It is recommended to choose columns with a large value range and few duplicate keys; generally estimate bucket count at 128MB–1GB per bucket; defaults to 256 buckets if not specified.SORTED BY (col): Optional, specifies the sort order within buckets; recommended to align withCLUSTERED BYfor better query performance. -
COMMENT: Add a comment to the dynamic table. -
refreshOption(refresh options): Core configuration controlling data freshness and computation cost, consisting of the following sub-parameters:-
START WITH timestamp_expr: Specifies the time point for the first refresh — if not set, starts immediately upon creation. Supports timestamp literals or expressions, such as'2023-11-07 14:49:18'orcurrent_timestamp() - interval '12' hours. The time point cannot be earlier than the Time Travel retention range of the source table (default 1 day), otherwise an error is reported. -
INTERVAL interval_time: Controls refresh frequency, directly determining data freshness and compute resource consumption — shorter intervals mean more timely data but more compute resources consumed. Minimum value is 1 minute (INTERVAL '1' MINUTEorINTERVAL '60' SECOND). If onlySTART WITHis specified withoutINTERVAL, the refresh occurs only once at the specified time.
-
| Syntax | Description | Example |
|---|---|---|
INTERVAL 'n' DAY | Specify day interval | INTERVAL '1' DAY means 1 day |
INTERVAL 'n' HOUR | Specify hour interval | INTERVAL '23' HOUR means 23 hours |
INTERVAL 'n' MINUTE | Specify minute interval | INTERVAL '59' MINUTE means 59 minutes |
INTERVAL 'n' SECOND | Specify second interval | INTERVAL '59.999' SECOND means 59.999 seconds |
-
VCLUSTER vc_name: Specifies the compute cluster for executing refresh tasks — automatic refresh continuously consumes compute resources, so this must be explicitly specified to avoid occupying interactive query clusters. Defaults to the current session's cluster if not specified.
Scheduling Methods
Dynamic Tables support two scheduling methods:
| Method | Use Case | Drawbacks |
|---|---|---|
| DDL built-in scheduling (REFRESH INTERVAL) | Quick validation, development testing | No alerts, no dependency orchestration; can only check status manually via SQL |
| Studio Task scheduling (recommended) | Production environments | Requires creating an additional Task |
Studio Task scheduling is recommended for production environments. DDL built-in scheduling (REFRESH INTERVAL) cannot configure alerts; refresh failures will not proactively notify you, and you can only check status manually by executing SHOW DYNAMIC TABLE REFRESH HISTORY.
DDL Built-in Scheduling
Write a REFRESH INTERVAL clause in the CREATE statement, and Lakehouse automatically triggers it at the specified frequency:
Studio Task Scheduling
Create a scheduling task in Studio with the REFRESH command as the task content.
Non-partitioned DT / dynamically partitioned DT:
Statically partitioned DT (with parameters):
Self-dependency must be configured: Concurrent REFRESH on the same DT is prohibited (it causes write conflicts or data inconsistency). Tasks must enable self-dependency to ensure the next instance starts only after the previous one completes. Different partitions of a statically partitioned DT can be refreshed in parallel; the same partition, non-partitioned, and dynamically partitioned DTs prohibit concurrent refresh.
Upstream dependencies: If the DT's source table data needs to wait for upstream task output before refreshing, configure upstream dependencies; this is not needed if the source table is a real-time write table.
Alert configuration: In production environments, it is recommended to configure failure alerts, timeout alerts, and no-run alerts.
Scheduling Orchestration for Multi-level DT Pipelines
When multiple DTs form upstream-downstream dependencies (e.g., DT_A → DT_B → DT_C), each DT corresponds to a Studio Task, and execution order is guaranteed through task dependency relationships:
Notes
- Scheduling method: In production environments, prefer Studio Task scheduling for REFRESH to enable dependency orchestration with upstream/downstream tasks and unified monitoring and alerting. See DT Scheduling and Deployment Standards.
- Time Travel dependency: Incremental refresh of dynamic tables is based on historical versions of base tables. Historical versions depend on the TIME TRAVEL (
data_retention_days) parameter. An error is reported if the specified version does not exist. Lakehouse retains data for one day by default.
Parameterized Dynamic Tables
Dynamic Tables support parameterized definitions, used to convert traditional offline scheduling tasks (e.g., select * from source_table where pt=${bizdate}) into incremental tasks.
Parameterized definitions consist of two parts:
- Define parameters: Use
SESSION_CONFIGS()['dt.args.xx']in SQL instead of hardcoded partition values. - Pass parameters: Pass specific values via
SET dt.args.xx = value;at refresh time.
Example
Refresh Behavior
| Trigger Scenario | Refresh Mode | Description |
|---|---|---|
| First REFRESH after initial creation | Full | No historical baseline; must scan all data to establish initial state |
| First refresh of a partition (partitioned table) | Full | No historical baseline for this partition |
| Parameterized DT: parameter value same as last time | Incremental | SQL filter condition unchanged; only incremental data is processed |
| Parameterized DT: parameter value changed | Full | Parameter change is equivalent to modifying the table's computation definition |
OR REPLACE modified pass-through columns (added/removed columns) | Incremental | Column changes do not affect the computation logic of other columns |
OR REPLACE modified JOIN key / GROUP key or processing logic | Full | Historical data needs to be recomputed with the new logic |
| DML write to dynamic table followed by REFRESH | Full | Data state has been externally modified; incremental baseline is unreliable |
Multi-level Partition Refresh
When refreshing a partitioned table, the partition_spec must be specified in the order of the table's partition hierarchy. Intermediate levels cannot be skipped.
-
Valid specification: Specify higher-level and some lower-level partitions.
-
Invalid specification: Skip an intermediate level (e.g., skip hour and directly specify min).
Partition Strategy Selection
Dynamic Tables have two creation syntaxes: static partition DT and dynamic partition DT (non-partitioned DT can be considered a special case of dynamic partition). The two differ fundamentally in creation syntax, refresh behavior, and incremental behavior.
Core Concepts
Static partition DT (partitioned DT with SESSION_CONFIGS parameters): SQL references partition parameters via SESSION_CONFIGS(), and a specific partition value is specified at each REFRESH. Each partition is refreshed independently and can be viewed as an independent DT unit.
Dynamic partition DT (non-partitioned DT / DT without parameters): SQL does not reference SESSION_CONFIGS(), or although partitioned, partition values are dynamically generated by the query logic. Each REFRESH processes all incremental data from all source tables.
Dynamic partition DTs do not allow any command other than REFRESH to modify data (INSERT / UPDATE / DELETE / MERGE are all unavailable); data is entirely driven by REFRESH. Therefore, the following ETL scenarios are not suitable for dynamic partition DTs:
- Need to manually backfill data (e.g., discovered a few rows of incorrect data that need direct UPDATE)
- Need to delete data by condition (e.g., cleaning dirty data, deleting expired records)
- Need MERGE INTO for upsert (e.g., consuming Stream after CDC and merging into target table)
- Need INSERT INTO to append external data (e.g., manually importing a batch of supplementary data)
- Need to independently backfill or re-refresh a specific partition (dynamic partition DTs can only refresh the entire table, not individual partitions)
Key Differences
| Dimension | Static Partition DT | Dynamic Partition DT |
|---|---|---|
| Does SQL contain SESSION_CONFIGS()? | Yes, for referencing partition parameters | No |
| REFRESH syntax | REFRESH ... PARTITION(ds='xxx') | REFRESH ... (without PARTITION) |
| Incremental scope | Only processes incremental data for the specified partition | Processes all incremental data from all source tables |
| Scheduling method | External scheduler triggers one partition at a time | Triggered by time or external scheduler |
| Data lifecycle | Managed per partition; can independently backfill/delete | Managed uniformly for the entire table |
| State table | Maintained independently per partition | Maintained globally |
| Suitable data patterns | T+1 batch processing, time-partitioned ETL | Real-time streams, global aggregations, no clear partition key |
Selection Decision Tree
Partition Granularity Selection
After choosing static partition DT, you also need to determine partition granularity:
| Data Pattern | Recommended Granularity | Description |
|---|---|---|
| Roughly ordered, few late-arriving records | Hourly (dt_hour) | Balances granularity and management complexity |
| T+1 batch import | Daily (ds) | Most common ETL scenario |
| By business cycle | Weekly/Monthly | Reporting scenarios |
| Multi-level partitions | Day + Hour (ds, hour) | Requires finer lifecycle management |
Selection principles:
- Finer granularity → smaller data volume per refresh → higher incremental efficiency
- Finer granularity → more partitions → more complex management and scheduling
- Granularity should match data write frequency: if data is written hourly, partition granularity should not be finer than hourly
Inferring Partition Strategy from Original ETL
| Original ETL Pattern | Recommended DT Partition Strategy |
|---|---|
| INSERT OVERWRITE TABLE t PARTITION(ds='${ds}') | Static partition DT, daily |
| INSERT OVERWRITE TABLE t PARTITION(ds='${ds}', hour='${hour}') | Static partition DT, day+hour |
| INSERT OVERWRITE TABLE t PARTITION(ds) (dynamic partition write) | Dynamic partition DT or static partition DT (depends on whether lifecycle management by partition is needed) |
| INSERT INTO TABLE t SELECT ... (no partition) | Dynamic partition DT |
| INSERT OVERWRITE TABLE t SELECT ... (full table overwrite) | Dynamic partition DT |
Scenario Examples
Example 1: Converting an Offline Task to an Incremental Task
Original offline SQL:
Conversion steps:
- Replace
${bizdate}withSESSION_CONFIGS()['dt.args.bizdate']. - Create the Dynamic Table.
- Schedule the refresh command.
Example 2: Multi-table Join Analysis
Reference Documentation
- Dynamic Table Visual Interface Development
- View Dynamic Table Details
- View All Dynamic Tables in a Schema
- Modify Dynamic Table
- Drop Dynamic Table
- View Dynamic Table Refresh History
- View Dynamic Table DDL
- Recover Dropped Dynamic Table
- Restore Dynamic Table to a Specific Version
- View Dynamic Table Version History
- View Dynamic Table Data at a Specific Version
- Dynamic Table Introduction
Related Guides
- Real-time Data Pipeline Selection Guide: Comparison of Pipe, Table Stream, and Dynamic Table selection
- Dynamic Table Development Guide: Incremental refresh mechanism, scheduling configuration, and comparison with Tasks
