Data Tables

Data tables are the core objects for storing and processing data in Singdata Lakehouse. Standard tables and Dynamic Tables use Parquet columnar storage — at query time only the required columns are read, making them well-suited for large-scale analytical queries. Unlike row-oriented databases such as MySQL, data is automatically organized by column and compressed on write, dramatically reducing I/O. Unlike Hive's static partitions, Lakehouse uses a hidden partition mechanism similar to Apache Iceberg, allowing partition strategies to be modified without affecting existing data.

Choosing a Table Type

Table typeHow data is maintainedTypical use cases
Standard TableManual INSERT/UPDATE/DELETEODS raw data, dimension tables, CDC sync targets
Dynamic TableAutomatic incremental refreshDWD/DWS/ADS layers, query-driven automatic computation
ViewNo data stored, computed dynamically at query timeLogic encapsulation, simplifying complex queries
Materialized ViewAutomatically refreshed, pre-computed results storedTransparent query acceleration, single-table query optimization
External TableData in external systems, Lakehouse manages metadataFederated queries, data lake access
Semantic ViewEncapsulates business semantics, supports natural language queriesAI conversational analytics, unified business metric definitions

Which table type should you use?

Step 1: Where does the data come from?

  • Synced from an external source (MySQL/PostgreSQL/Kafka) → sync job writes to a Standard Table
  • File import (CSV/Parquet/JSON) → COPY INTO writes to a Standard Table
  • SQL query results → Dynamic Table (auto-refresh) or View (no storage)

Step 2: How will the data be used?

ScenarioRecommendationReason
ODS raw data layerStandard TablePrecise control over write timing
DWD cleansed data layerDynamic TableAutomatically computes incrementally from ODS, no scheduling needed
DWS metrics summary layerDynamic TableAutomatically aggregates from DWD
Frequently queried intermediate resultsMaterialized ViewTransparent acceleration, users don't need to change SQL
Simplifying complex queriesViewNo data stored, zero cost
AI conversational analyticsSemantic ViewEncapsulates business semantics
Federated queries on external dataExternal TableData stays outside Lakehouse

Storage Format

Lakehouse tables use Parquet columnar storage by default:

  • Efficient compression: Consecutive storage of same-type data yields high compression ratios.
  • Query optimization: Only the columns needed by a query are read, reducing I/O.
  • Schema evolution: Supports adding columns, modifying column types, and other schema changes.

Partitions and Bucketing

  • Partitions: Similar to Iceberg hidden partitions, supporting transform partitions (years/months/days/hours). Partition strategies can be modified without affecting existing data.
  • Bucketing: CLUSTER BY uses hash bucketing to optimize JOIN and aggregation operations.

Cost Impact at a Glance

OperationStorage impactCompute impact
Create tableMetadata only, near-zero costNone
Write dataBilled by Parquet-compressed data sizeConsumes VCluster CRU
Enable Time TravelRetains historical versions, increases storageNone
Create partitionNo additional storageReduces scan volume at query time
Create indexIndex data occupies additional storageAccelerates queries, reduces CRU