Create External Catalog

Steps to create a Hive External Catalog:

  1. Create Storage Connection: First, you need to create a storage connection to access the object storage service.
  2. Create Catalog Connection: Use the storage connection information and Hive Metastore address to create a Catalog Connection.
  3. Create External Catalog: Use the Catalog Connection to create an external Catalog to access external data in the data lake.

Syntax

CREATE EXTERNAL CATALOG catalog_name CONNECTION catalog_api_connection;

Parameter Description

  • catalog_api_connection: The name of the Catalog Connection. Supports HMS (Hive), Databricks Unity Catalog, Iceberg REST Catalog, and other types. Refer to creating Catalog Connection

Examples

Case 1: Hive ON OSS (Read and Write Supported)

  • Create storage connection

CREATE STORAGE CONNECTION if not exists catalog_storage_oss type OSS ACCESS_ID='LTAIxxxxxxxxxxxx' ACCESS_KEY='T8Gexxxxxxmtxxxxxx' ENDPOINT='oss-cn-hangzhou-internal.aliyuncs.com';

  • Create Catalog Connection Please ensure that the network between the server where HMS is located and the Lakehouse is connected. For specific connection methods, please refer to Creating Alibaba Cloud Endpoint Service

CREATE CATALOG CONNECTION if not exists catalog_api_connection type hms hive_metastore_uris='xxxx:9083' storage_connection='catalog_storage_oss';

  • Create Catalog

CREATE EXTERNAL CATALOG my_external_catalog CONNECTION catalog_api_connection;

  • Verify connectivity to Hive Catalog

-- Verify reading metadata SHOW SCHEMAS IN my_external_catalog; +---------------------------------------------------------------------------+ | schema_name | +---------------------------------------------------------------------------+ | air_travel | | all_data | | automobile | | automv_schema | | bigquant | +---------------------------------------------------------------------------+ -- Verify reading data, STORAGE CONNECTION permission will be used when reading data SELECT * FROM my_external_catalog.my_schema.my_table;

Case 2: Hive ON COS (Read and Write Supported)

  • Create storage connection

CREATE STORAGE CONNECTION catalog_storage_cos TYPE COS ACCESS_KEY = '<access_key>' SECRET_KEY = '<secret_key>' REGION = 'ap-shanghai' APP_ID = '1310000503';

Parameters: * TYPE: The object storage type, for Tencent Cloud, fill in COS (case insensitive) * ACCESS_KEY / SECRET_KEY: The access keys for Tencent Cloud, refer to: Access Keys * REGION: Refers to the region where the Tencent Cloud Object Storage COS data center is located. When Singdata Lakehouse accesses Tencent Cloud COS within the same region, the COS service will automatically route to internal network access. For specific values, please refer to the Tencent Cloud documentation: Regions and Access Domains.

  • Create Catalog Connection Please ensure that the network between the HMS server and Lakehouse is connected. For specific connection methods, refer to Create Tencent Cloud Endpoint Service

CREATE CATALOG CONNECTION if not exists catalog_api_connection type hms hive_metastore_uris='xxxx:9083' storage_connection='catalog_storage_cos';

  • Create Catalog

CREATE EXTERNAL CATALOG my_external_catalog CONNECTION catalog_api_connection;

  • Verify connectivity to Hive Catalog

-- Verify reading metadata SHOW SCHEMAS IN my_external_catalog; +---------------------------------------------------------------------------+ | schema_name | +---------------------------------------------------------------------------+ | air_travel | | all_data | | automobile | | automv_schema | | bigquant | +---------------------------------------------------------------------------+ -- Verify reading data, STORAGE CONNECTION permission will be used when reading data SELECT * FROM my_external_catalog.my_schema.my_table;

Case 3: Hive ON S3 (Read and Write Supported)

  • Create storage connection

CREATE STORAGE CONNECTION catalog_storage_s3 TYPE S3 ACCESS_KEY = 'AKIAQNBSBP6EIJE33***' SECRET_KEY = '7kfheDrmq***************************' ENDPOINT = 's3.cn-north-1.amazonaws.com.cn' REGION = 'cn-north-1';

Parameters:

  • TYPE: The object storage type, AWS should be filled in as S3 (case insensitive)
  • ACCESS_KEY / SECRET_KEY: The access key for AWS, refer to: Access Keys for how to obtain it
  • ENDPOINT: The service address for S3, AWS China is divided into Beijing and Ningxia regions. The service address for S3 in the Beijing region is s3.cn-north-1.amazonaws.com.cn, and for the Ningxia region, it is s3.cn-northwest-1.amazonaws.com.cn. Refer to: China Region Endpoints to find the endpoints for the Beijing and Ningxia regions -> Amazon S3 corresponding endpoints
  • REGION: AWS China is divided into Beijing and Ningxia regions, the region values are: Beijing region cn-north-1, Ningxia region cn-northwest-1. Refer to: China Region Endpoints
  • Create Catalog Connection

CREATE CATALOG CONNECTION if not exists catalog_api_connection type hms hive_metastore_uris='xxxx:9083' storage_connection='catalog_storage_s3';

  • Create Catalog

CREATE EXTERNAL CATALOG my_external_catalog CONNECTION catalog_api_connection;

  • Verify connectivity to Hive Catalog

-- Verify reading metadata SHOW SCHEMAS IN my_external_catalog; +---------------------------------------------------------------------------+ | schema_name | +---------------------------------------------------------------------------+ | air_travel | | all_data | | automobile | | automv_schema | | bigquant | +---------------------------------------------------------------------------+ -- Verify reading data, STORAGE CONNECTION permission will be used when reading data SELECT * FROM my_external_catalog.my_schema.my_table;

Case 4: Hive ON HDFS (Read Support)

  • Create Storage Connection

    CREATE STORAGE CONNECTION hdfs_conn TYPE HDFS NAME_NODE='zetta-cluster' NAME_NODE_RPC_ADDRESSES=['11.110.239.148:8020'];

    • TYPE HDFS: Specifies the connection type as HDFS.
    • NAME_NODE: Corresponds to dfs.nameservices in the HDFS configuration, which is the logical name of the HDFS cluster, such as zetta-cluster.
    • NAME_NODE_RPC_ADDRESSES: Corresponds to dfs.namenode.rpc-address in the HDFS configuration, which is the RPC address of the NameNode, formatted as [<host>:<port>], such as ['11.110.239.148:8020'].
  • Create Catalog Connection

    CREATE CATALOG CONNECTION if not exists catalog_api_connection type hms hive_metastore_uris='xxxx:9083' storage_connection='hdfs_conn';

  • Create Catalog

    CREATE EXTERNAL CATALOG my_external_catalog CONNECTION catalog_api_connection;

  • Verify Connectivity to Hive Catalog

    -- Verify metadata reading SHOW SCHEMAS IN my_external_catalog; -- Verify data reading; data is read using the permissions of the STORAGE CONNECTION SELECT * FROM my_external_catalog.my_schema.my_table;

Create Databricks External Catalog

Steps to create Databricks External Catalog:

  1. Create Catalog Connection: Store Databricks' Unity Catalog connection authentication information.
  2. Create External Catalog: Use Catalog Connection to create an external Catalog to access external data in the data lake.

Syntax

CREATE EXTERNAL CATALOG catalog_name CONNECTION catalog_api_connection OPTIONS ('catalog'='catalog_name');

Parameter Description

  • catalog_name: The name of the external Catalog. This name is used to identify the Catalog, and it must be unique and comply with naming conventions.
  • CONNECTION catalog_api_connection: Specifies the connection to the external Catalog. catalog_api_connection is the name of a pre-created connection used to access the external Catalog.
  • OPTIONS ('catalog'='catalog_name'): Specifies the configuration options for the external Catalog. 'catalog'='catalog_name': Indicates the name of the external Catalog, catalog_name is the name of the target Catalog.