Databricks Unity Catalog Federation Query Practice
Singdata Lakehouse queries tables in Databricks Unity Catalog directly through an External Catalog. Data stays in Databricks' S3 storage without moving — Lakehouse handles SQL execution and result delivery. This guide uses an AWS environment as an example and walks through the complete configuration process from scratch.
Prerequisites
- Databricks workspace: Unity Catalog support required (Free Edition already supports it)
- Singdata Lakehouse instance: must be on the same cloud platform (both AWS) as the Databricks data storage (S3)
- Tools: cz-cli with the corresponding AWS profile pre-configured
SQL Commands Involved
| Command | Purpose |
|---|---|
CREATE CATALOG CONNECTION | Store Databricks OAuth M2M authentication credentials |
CREATE EXTERNAL CATALOG | Create an external catalog pointing to Databricks Unity Catalog |
SHOW SCHEMAS IN | List schemas in a Databricks Catalog |
SHOW TABLES IN | List tables in a schema |
SELECT | Query Databricks table data |
INSERT | Append data (append only; UPDATE / DELETE not supported) |
Databricks Configuration
Create a Service Principal
Open https://accounts.cloud.databricks.com → User management → Service principals → Add service principal, and give it a recognizable name (e.g., lakehouse_connector).
After creating the SP, click its name to enter the details page and complete the following:
- Roles tab → Enable Account admin
- Principal information tab → Record the Application ID (this is the
CLIENT_IDused later, in UUID format) - Credentials & secrets tab → Generate secret → Record the complete Secret generated (this is the
CLIENT_SECRETused later)
Add the SP to the Workspace
In the Databricks Workspace → Settings → Identity and access → Service Principals → Add service principal, and add the SP you just created.
Enable Metastore External Data Access
In the Databricks Workspace → Catalog → gear icon → Metastore → Details tab → find External data access → enable the toggle.
Without this option enabled, querying data will produce:
Grant Catalog and Schema Permissions
In Databricks Catalog Explorer, grant the following permissions to the SP:
- Catalog level:
USE CATALOG - Schema level:
USE SCHEMA,SELECT,EXTERNAL USE SCHEMA
If you have a SQL execution environment (Notebook or SQL Warehouse), you can also use GRANT commands (replace <application-id> with the SP's Application ID in UUID format):
Create a Catalog Connection
Verify the connection:
Create an External Catalog
The value of catalog is the name of the catalog in Databricks Unity Catalog. In the Databricks Workspace → left sidebar Catalog icon → expand the left panel. Catalogs listed under My organization are the available ones (e.g., workspace, main, hive_metastore).
Verify:
Sample output:
Querying Data
Federation Queries: Cross-Platform SQL Analysis
The core value of federation queries is: Lakehouse can directly query and join Databricks data, and can also write Databricks data into Lakehouse local tables — all without any data migration.
Scenario 1: JOIN Between Databricks Tables
orders_external (orders) and customers_external (customers) are joined via customer_id to calculate the order count and total spend per customer. Both tables have a price field of type DECIMAL(10,2), requiring no conversion:
Query results:
| customer_name | country | loyalty_tier | order_count | total_revenue |
|---|---|---|---|---|
| Alice Chen | China | Gold | 2 | 1698.99 |
| Frank Liu | China | Silver | 1 | 299.99 |
| Carol Zhang | China | Platinum | 2 | 249.98 |
| David Lee | Singapore | Bronze | 1 | 129.99 |
| Emma Wang | China | Silver | 1 | 89.99 |
Scenario 2: Low Inventory Alert
inventory_delta records inventory by warehouse (fields: product_id, warehouse_location, quantity_available). Query products with inventory below a threshold, summarized by warehouse:
Query results:
| warehouse_location | low_stock_products | min_stock | avg_stock |
|---|---|---|---|
| Warehouse A | 1 | 50 | 50.0 |
| Warehouse B | 2 | 75 | 112.5 |
Scenario 3: Writing Databricks Data to Lakehouse
Federation query results can be written directly into Lakehouse local tables for data consolidation. In the example below, public is a Lakehouse local schema (not Databricks) — replace it with your actual local schema name:
Query the local table with no network overhead:
| order_id | customer_id | order_date | product_name | price | status |
|---|---|---|---|---|---|
| 2006 | 1005 | 2026-05-20 | Webcam HD | 89.99 | Delivered |
| 2001 | 1001 | 2026-05-15 | Laptop Pro | 1299.99 | Delivered |
| 2004 | 1001 | 2026-05-18 | Monitor 27inch | 399.00 | Delivered |
Scenario 4: Dynamic Table Consuming Databricks Data
Use Databricks data as the upstream source for a Dynamic Table to periodically aggregate into Lakehouse. public is a Lakehouse local schema — replace it with your actual local schema name:
Query after the first refresh:
| order_date | order_count | total_revenue | avg_order_value |
|---|---|---|---|
| 2026-05-15 | 1 | 1299.99 | 1299.99 |
| 2026-05-17 | 1 | 49.99 | 49.99 |
| 2026-05-18 | 1 | 399.00 | 399.00 |
| 2026-05-19 | 1 | 129.99 | 129.99 |
| 2026-05-20 | 1 | 89.99 | 89.99 |
| 2026-06-04 | 2 | 499.98 | 249.99 |
Supported Table Types
Not all Databricks tables can be queried from Lakehouse. Support depends on the table's storage type:
| Table Type | Format | Supported | Notes |
|---|---|---|---|
TABLE_DELTA_EXTERNAL | Delta | Yes | Fully supported, recommended |
TABLE_DELTA | Delta | Yes | Fully supported |
TABLE_EXTERNAL (Delta format) | Delta | Yes | Supported |
TABLE_EXTERNAL (Parquet/CSV/JSON) | Non-Delta | No | Reports unsupported databricks table format |
TABLE_DB_STORAGE | Managed Delta | No | Cross-platform access not supported |
VIEW | — | No | Driver compatibility issue |
Key conclusion: Only Delta format tables support federation queries, whether External or regular Delta tables. External tables in Parquet, CSV, or JSON format are currently not supported.
When creating tables in Databricks, prefer Delta format:
Common Error Troubleshooting
invalid_client
OAuth authentication failed. Check in order:
- Has the SP enabled Account admin in Account Console → Roles?
- Has the SP been added in Workspace → Settings → Service Principals?
- Is the
CLIENT_SECRETthe complete value (not a masked value like50db****7f61)? Go to Account Console → SP → Credentials & secrets to generate a new one. - Has the Secret expired? Check the
Expires atfield on the Credentials & secrets page. If expired, generate a new Secret and re-runCREATE CATALOG CONNECTION.
PermissionDenied: External Data Access ... is disabled
The Metastore has not enabled external data access. In the Databricks Workspace → Catalog → gear icon → Metastore → External data access → enable.
PermissionDenied: User does not have USE CATALOG
The SP does not have Catalog access permission. In Databricks Catalog Explorer, find the corresponding Catalog → Permissions → Grant → add USE CATALOG for the SP.
PermissionDenied: User does not have EXTERNAL USE SCHEMA
The SP does not have external access permission for the Schema. In Databricks Catalog Explorer → Schema → Permissions → Grant → add USE SCHEMA, SELECT, and EXTERNAL USE SCHEMA for the SP.
NotFound: Catalog 'main' does not exist
The catalog name specified in OPTIONS ('catalog' = ...) does not exist. Open the Databricks Workspace → Catalog panel to check the actual catalog names.
Query Timeout (300 seconds) or PermanentRedirect
ACCESS_REGION is incorrect; S3 requests are being redirected. Check the table's actual storage location (Catalog Explorer → table details → Storage Location), confirm the S3 bucket's region, and recreate the Catalog Connection.
unsupported databricks table format {} [PARQUET/CSV/JSON]
The table uses a non-Delta format, which is currently not supported for querying from Lakehouse. Recreate the table in Databricks using Delta format, or convert the data to a Delta table.
Table cannot be accessed from outside of Databricks Compute Environment ... kind being TABLE_DB_STORAGE
The table is a Databricks Managed Table — data is stored in Databricks-controlled storage and does not support direct cross-platform access. The table must first be converted to an External Table in Databricks before it can be queried.
Important Notes
| Note | Description |
|---|---|
| Cloud platform restriction | Databricks' S3 storage must be on the same cloud platform as the Lakehouse instance (both AWS). Databricks on GCP/Azure cannot be interconnected with an AWS Lakehouse. |
| Region consistency | ACCESS_REGION must match the S3 bucket's region, not the workspace region. |
| Read-only restriction | External Catalogs are read-only — writing data from Lakehouse to Databricks (INSERT/UPDATE/DELETE) is not supported. The reverse (writing Databricks data into Lakehouse) is fully supported; see Scenario 3. |
| Version requirement | Requires a version that supports Unity Catalog. Free Edition is supported; Community Edition is not. |
| Save your Secret | The OAuth Secret is only shown in full at generation time — save it immediately. If lost, you must generate a new one. |
Related Documentation
- Create Catalog Connection — Complete DDL syntax and parameter descriptions
- Create Databricks External Catalog — External Catalog DDL syntax
- Federation Query Usage Guide — Complete federation query examples
