Object Model Overview

The Singdata Lakehouse object model defines the types, hierarchy, and interactions of all manageable resources in the system. Understanding the object model helps you quickly locate the features you need, organize your data assets correctly, and design a sound data architecture.

Object Hierarchy

Objects in Lakehouse are organized in the following hierarchy:

Hierarchy description:

LevelDescriptionContained objects
InstanceThe service instance is the top-level container for all Lakehouse resources, including compute, storage, and metadataNetwork Policy, Instance Role, Share, Catalog (three types: Workspace / External Catalog / SHARED)
WorkspaceA Workspace is a MANAGED-type Catalog that provides both a data layer (Schemas and data objects) and a Studio layer (VCluster, users, job scheduling). Workspaces are isolated from each other by defaultConnection, VCluster, Workspace Role, Workspace User, External Schema, Studio jobs (SQL / Python / Shell / sync tasks / workflows), and all subordinate Schemas
SchemaA Schema is the namespace for data objects within a Workspace, used for logical grouping and management of tables, views, and other objectsTable, Dynamic Table, View, Volume, Pipe, Table Stream, Index, Function, Synonym, Semantic View

Object Categories

Objects in Lakehouse are grouped by purpose into the following categories:

Organizational Hierarchy

Organizational hierarchy objects form the resource structure of Lakehouse:

  • Workspace — The native top-level namespace in Lakehouse, providing two layers of capability: a data layer (all Schemas and data objects, using three-part naming workspace.schema.table) and a Studio layer (independent user system, role-based permissions, VCluster, and job scheduling). Workspaces are fully isolated from each other by default. Belongs to the Instance level.

  • Catalog — The general top-level namespace concept introduced for federated queries. A Workspace is the MANAGED type (full capability). Two additional read-only types exist, neither of which has a Studio layer:

    • External Catalog (EXTERNAL) — Maps metadata from external data sources (Hive, Databricks, Iceberg, etc.), enabling direct queries on external data using three-part naming catalog.schema.table without data migration. Belongs to the Instance level.
    • SHARED — System-built shared datasets (TPC-H / TPC-DS), read-only. Belongs to the Instance level.
  • Schema — The logical namespace for data objects within a Workspace, used for layered management (e.g., ods / dwd / ads). Different Schemas within the same Workspace can reference each other. Belongs to the Workspace level.

  • External Schema — Created from an External Catalog, maps a Schema from an external data system into the current Workspace so users can query external data with standard SQL without data migration. Belongs to the Workspace level.

Data Tables

Data tables are the core objects for storing and processing data. All belong to the Schema level:

  • Table — A columnar-storage structured data table supporting INSERT / UPDATE / DELETE. The foundational storage unit for all layers of a data warehouse.
  • Dynamic Table — Defines transformation logic in SQL; the system automatically refreshes results incrementally. Ideal for building ODS→DWD→ADS data pipelines without manual scheduling.
  • View — A virtual table that stores no data and is computed dynamically at query time. Useful for encapsulating complex SQL and enforcing column-level access control.
  • Materialized View — Pre-computes and physically stores query results. Suitable for frequently executed fixed aggregation queries, trading storage for query speed.
  • External Table — Data resides in an external system (Delta Lake, Hudi, Kafka, etc.); Lakehouse manages only the metadata. Suitable when you want to query data in place without migrating it.
  • Semantic View — Encapsulates multi-table JOIN and aggregation logic as a business semantic layer. BI tools and AI agents access data through semantic views, hiding the complexity of the underlying table structure.

File Storage

File storage objects belong to the Schema level and are used to manage unstructured data and object storage files:

  • Volume — A file storage mount point. Pipes read files from a Volume and write them into tables; External Functions can read model files stored in a Volume.
    • Internal Volume — User Volume (personal user space) and Table Volume (table-associated storage), created automatically with the instance.
    • External Volume — Mounts an existing object storage bucket (OSS / COS / S3). Data stays in place; Lakehouse accesses it via a Storage Connection.

Connection Objects

Connection objects belong to the Workspace level and centrally store authentication credentials for third-party services, avoiding hard-coded secrets in SQL:

  • Connection — Securely stores authentication information for third-party services. Access is controlled by the Workspace administrator.
    • API Connection — Stores invocation credentials for cloud functions, used by External Functions to call services such as Alibaba Cloud FC and Tencent Cloud SCF.
    • Storage Connection — Stores access keys for object storage, used by external Volumes and external tables (OSS, COS, S3).
    • Catalog Connection — Stores connection information for external metadata services, used by External Catalogs to connect to Hive Metastore and similar systems.

Data Pipelines and Change Capture

Data pipeline objects belong to the Schema level and handle automatic data flow and change tracking:

  • Pipe — Continuously monitors a Volume or Kafka topic and automatically writes newly arrived files or messages into a target table. Replaces manual polling scripts for fully automated file ingestion.
  • Table Stream — A cursor object that records incremental changes (INSERT / UPDATE / DELETE) on a table without storing the data itself. Downstream Dynamic Tables or jobs consume the Stream to implement CDC-driven incremental computation.

Indexes

Indexes belong to the Schema level and build auxiliary data structures on tables to accelerate filter conditions without changing the physical storage layout:

  • Bloomfilter Index — Suited for equality queries (=, IN). Minimal storage overhead in exchange for significantly fewer unnecessary block reads.
  • Inverted Index — Suited for full-text search and keyword matching, with support for Chinese tokenization.
  • Vector Index — Suited for semantic similarity search, supporting ANN (approximate nearest neighbor) acceleration for vector retrieval.

Partitions and Bucketing

Partitions and bucketing belong to the Schema level and determine the physical organization of data. They are specified at table creation time and affect the data scan range at query time:

  • Partition — Physically groups data by time or business fields. Queries automatically skip irrelevant partitions, making this the primary optimization technique for large-table query performance.
  • Bucketing — Hashes data into buckets by specified columns, co-locating rows with the same key in the same bucket. Significantly improves data locality for JOIN and aggregation workloads.

Functions

Function objects belong to the Schema level:

  • User-Defined Functions — Encapsulate reusable computation logic in SQL or code, callable like built-in functions in any query.
    • SQL Function — Defined with SQL expressions and executed within the engine. Suitable for encapsulating business rules, calculation formulas, and other pure-SQL logic.
    • External Function — Registers an external HTTP service as a SQL function. Suitable for calling LLMs for text processing, vision services for image recognition, and other AI-augmented computation scenarios.

Synonyms

Synonym objects belong to the Schema level:

  • Synonym — Creates a local alias for an object in another Schema. When the ADS layer references dimension tables from the DIM layer, synonyms avoid writing the full three-part path (workspace.schema.table) in every query.

Data Sharing

Data sharing objects belong to the Instance level:

  • Share — A Provider instance grants access to specified tables or views to a Consumer instance within the same cloud and service region. The Consumer reads the Provider's original data directly — no data copying, no storage cost, no synchronization delay. Cross-cloud or cross-region sharing is not supported.

Studio Objects

Studio objects belong to the Workspace level and form Lakehouse's built-in data development and scheduling environment. They share the same user system and permission controls as the SQL data objects in the same Workspace:

  • SQL Job — Write and schedule SQL data processing logic in the Studio IDE, with support for dependency orchestration and time-based triggers.
  • Python / Shell Job — Run custom scripts to handle complex logic that SQL cannot cover.
  • Data Sync Job — Visually configure real-time CDC sync or offline batch sync for 40+ data sources without writing code. Runs on an Integration VCluster under the hood.
  • Workflow (Composite Job) — Orchestrates multiple jobs into a dependency-aware DAG for unified scheduling and monitoring.

Compute Resources

Compute resource objects belong to the Workspace level:

  • VCluster (Compute Cluster) — An elastic compute resource pool that starts and stops on demand with no charges when idle. Multiple VClusters can be created within the same Workspace to isolate different workloads.
    • General-purpose (GP VC): Suitable for mixed ETL and query workloads.
    • Analytics (AP VC): Optimized for large-scale OLAP queries; ideal for BI and ad-hoc analysis.
    • Integration (Integration VC): Designed for real-time CDC sync tasks with low-latency writes.

Security Policies

Security policy objects protect data and control access:

  • Network Policy — IP-based access control (allowlist / blocklist) that blocks unauthorized sources at the instance entry point. Belongs to the Instance level.
  • Dynamic Data Masking Policy — Dynamically replaces sensitive values in specified columns based on the user's role (e.g., phone numbers displayed as 138****8888). Query results are automatically masked; the underlying data is unchanged. Belongs to the Schema level (bound to table columns).

Identity and Permissions

Users follow a two-tier model: created at the instance level, authorized at the workspace level.

  • User — Created in the account console. Belongs to the Instance level. Newly created users have no data permissions by default; they must be added to a Workspace and granted a role before they can access its resources.
  • Role — A collection of permissions. Roles simplify permission management through batch authorization.
    • Instance Role — An instance-level role that applies across the entire service instance (e.g., instance_admin, instance_user). Belongs to the Instance level.
    • Workspace Role — A workspace-level role that applies only within a specific Workspace (e.g., workspace_admin, workspace_dev). Belongs to the Workspace level.

Advanced Table Features

The following are configurable features at the table level, not independent object types:

  • Time Travel — Access historical versions of a table to recover from accidental deletions or modifications. Use TIMESTAMP AS OF to query data at any historical point in time.
  • Data Lifecycle Management — Set expiration policies for tables or partitions to automatically reclaim expired data and control storage costs.

Typical Architecture Patterns

Multi-Cloud · Multi-Region · Multi-Instance

Regardless of which cloud or region a deployment targets, every Lakehouse instance presents a fully consistent SQL syntax, object model, API, and permission system. Business teams can switch deployment environments without rewriting any code.

Each cloud provider's each region can independently host one or more Lakehouse instances. Instances are fully isolated from each other, each with its own compute, storage, metadata, and access controls.

Currently supported cloud providers and regions:

Cloud ProviderRegion
Alibaba CloudEast China 2 (Shanghai), North China 2 (Beijing), Singapore
Tencent CloudNorth China (Beijing), East China (Shanghai), South China (Guangzhou)
AWSNorth China (Beijing), Singapore

Use cases: Multi-region disaster recovery, independent instances for overseas operations, compliance requirements for data residency.


Multiple Workspaces in One Instance — Business Line Isolation

A single instance can host multiple Workspaces. Different business lines use independent Workspaces, achieving full isolation of users, permissions, compute clusters, and data objects. Workspaces are not accessible to each other by default; cross-Workspace access requires explicit authorization.

The core value of this pattern is isolation: ETL jobs from the data platform team do not affect BI team query performance; experimental operations by the algorithm team cannot accidentally modify production data; data permissions across business lines remain independent.

Typical division:

  • Data Platform Workspace: Managed by data engineers, runs ETL and CDC sync jobs, holds write permissions.
  • Business Analytics Workspace: For analysts and BI teams, read-only access, connected to BI tools, uses a dedicated Analytics VCluster.
  • AI / ML Workspace: For algorithm engineers, runs vector search and LLM inference jobs, uses an AI-dedicated VCluster.

Multiple Schemas in One Workspace — Data Warehouse Layering

Within a single Workspace, Schemas implement data warehouse layering. Each layer's data objects are managed independently, and layers are connected through Dynamic Tables that automatically refresh incrementally. This is the recommended data warehouse architecture for Singdata Lakehouse: use Schemas to define layers, use Dynamic Tables to replace manual scheduling, and let the data pipeline run automatically.

Standard layers:

SchemaRolePrimary objects
odsRaw data layer, source-aligned storageTable, Pipe, Table Stream, External Table
dwdDetail data layer, cleansed and transformedDynamic Table, Partition, Dynamic Mask
dwsSummary data layer, aggregated metricsDynamic Table, Materialized View, Bloomfilter Index
adsApplication data layer, externally exposedTable, View, Semantic View, Synonym
dimDimension layer, reused across layersTable (slowly changing dimensions), Table Stream, Synonym

Zero-Copy Cross-Account Data Sharing Within the Same Cloud and Region

Using the Share object, a Provider instance can share tables or views in real time with a Consumer instance in the same cloud and service region. The Consumer queries the Provider's original data directly — no copying, no storage cost, no synchronization delay.

This pattern is suitable for sharing data across subsidiaries within a corporate group, or for data service providers exposing datasets to customers. The Consumer has read-only access and cannot modify the Provider's data; the Provider can revoke authorization at any time.

Constraint: Only cross-instance sharing within the same cloud provider and service region is supported. Cross-cloud or cross-region sharing is not supported.

Operation flow:

-- Provider: create a Share object and grant access CREATE SHARE my_share; ALTER SHARE my_share ADD TABLE ads.order_summary; ALTER SHARE my_share ADD INSTANCE consumer_instance_id; -- Consumer: create a read-only Schema from the Share and query CREATE SCHEMA shared_data FROM SHARE provider_instance.my_share; SELECT * FROM shared_data.order_summary;


Object Relationship Quick Reference

ScenarioObjects involved
Data ingestion (files)Volume → Pipe → Table
Data ingestion (database CDC)Connection → Real-time sync job (Studio) → Table
Data processing (incremental)Table → Dynamic Table → Table
Data processing (CDC consumption)Table → Table Stream → Dynamic Table (or job) → Table
Federated query (in-place acceleration, no data migration)External Catalog / External Schema → Query — Implementation guide
Data sharingShare → Cross-instance access within the same cloud and region (cross-cloud/cross-region not supported)
Query accelerationMaterialized View / Index / Partition → Table
AI-augmented analyticsVector Index + Inverted Index + Semantic View