Iceberg External Tables
Overview
Singdata Lakehouse supports reading Apache Iceberg format data stored on object storage via an External Catalog. Unlike Paimon external tables, Iceberg requires connecting through a Catalog Connection and does not support directly specifying a table directory path.
Verified versions: Iceberg format v1, v2
Verified cloud providers: Alibaba Cloud OSS
Prerequisites
- Iceberg format data exists on object storage and is managed through an Iceberg Catalog service
- The Iceberg Catalog service is accessible from Lakehouse compute nodes (requires public network or internal network connectivity)
- A storage Connection has been created in Lakehouse (for reading data files)
Supported Catalog Types
| Catalog Type | Description |
|---|---|
ICEBERG_REST | Iceberg REST Catalog (recommended, standardized interface) |
HMS | Hive Metastore (compatible with Iceberg tables managed by Hive) |
Step 1: Create a Storage Connection
Refer to the Paimon documentation to create an OSS / COS / S3 storage Connection:
Step 2: Create a Catalog Connection
Iceberg REST Catalog
Iceberg REST Catalog deployment options:
| Option | Description |
|---|---|
| Apache Polaris | Cloud-native Iceberg REST Catalog, supports multi-cloud storage |
| Nessie | Open-source, supports version control |
| Tabular REST Server | Reference implementation (tabulario/iceberg-rest Docker image) |
| Self-hosted | Any service compatible with the Iceberg REST OpenAPI spec |
Hive Metastore (HMS)
Step 3: Create an External Catalog
Example:
Step 4: Query Iceberg Data
Query directly using the three-part name catalog.schema.table without any additional table creation:
Data Type Compatibility
The following types have been verified through testing (Iceberg format v2, Alibaba Cloud OSS, including boundary value and NULL tests):
| Iceberg Type | Lakehouse Inferred Type | Status | Notes |
|---|---|---|---|
integer | INT | Supported | Including NOT NULL |
long | BIGINT | Supported | |
float | FLOAT | Supported | |
double | DOUBLE | Supported | |
boolean | BOOLEAN | Supported | Including NULL |
string | STRING | Supported | Including Chinese characters, empty string, NULL |
date | DATE | Supported | Range 1970-01-01 ~ 2099-12-31 |
timestamp | TIMESTAMP_NTZ | Supported | No timezone, microsecond precision |
decimal(p,s) | DECIMAL(p,s) | Supported | Including positive/negative/zero/NULL |
binary | BINARY | Supported | |
list<T> | ARRAY<T> | Supported | Including null elements |
map<K,V> | MAP<K NOT NULL, V> | Supported | Key automatically inferred as NOT NULL |
struct<...> | STRUCT<...> | Supported | Including null fields, supports Chinese values |
Partitioned tables: Identity partitioning is supported; partition fields automatically appear in the Partition Information section of DESC TABLE.
Validation Example
Differences from Paimon External Tables
| Feature | Paimon External Table | Iceberg External Catalog |
|---|---|---|
| Access method | CREATE EXTERNAL TABLE ... USING PAIMON LOCATION ... | CREATE EXTERNAL CATALOG ... CONNECTION ... |
| Metadata service | No additional service needed; reads Paimon schema files directly | Requires Catalog service (HMS / REST) |
| Access syntax | SELECT * FROM ext_table | SELECT * FROM catalog.schema.table |
| Table creation | Requires CREATE EXTERNAL TABLE | No table creation needed; auto-discovered through Catalog |
| Data file path | oss:// protocol | oss:// protocol (must be correctly configured in metadata.json) |
Notes
- Read-only: The current version only supports SELECT queries; INSERT / UPDATE / DELETE are not supported.
- Catalog accessibility: The Iceberg REST Catalog or HMS service must be accessible from Lakehouse compute nodes (requires public network access or internal network connectivity configured).
- Metadata path: Data file paths recorded in Iceberg metadata.json must use
oss://(nots3://orfile://); otherwise Lakehouse cannot read them. - Catalog lifecycle: External Catalogs are bound to Catalog Connections; after modifying Connection configuration, the Catalog must be recreated.
- Connection reuse: Multiple Iceberg tables under the same storage account can share the same Catalog Connection.
