Batch Upload Data Using Python SDK
This document details how to use the BulkloadStream in the Python SDK to batch load data into Lakehouse. This method is suitable for importing large amounts of data at once, supports custom data sources, and provides flexibility for data import. This example uses a local CSV file. If the data source is in object storage or within the data integration scope supported by Lakehouse Studio, it is recommended to use the COPY command or the data integration feature instead.
Reference Documentation
Uploading Data with Python SDK
Application Scenarios
- Suitable for business scenarios that require batch data uploads.
- Suitable for developers familiar with Python who need to customize data import logic.
Usage Restrictions
- BulkloadStream does not support writing to primary key (pk) tables.
- Not suitable for frequent data upload scenarios with intervals of less than five minutes.
Use Case
This example uses the olist_order_payments_dataset from the Brazilian E-commerce public dataset.
Prerequisites
- Create the target table
bulk_order_payments:
- Have INSERT permission on the target table.
| Parameter | Required | Description |
|---|---|---|
| username | Y | Username |
| password | Y | Password |
| service | Y | Address to connect to the Lakehouse, region_id.api.clickzetta.com. You can view the JDBC connection string in Lakehouse Studio under Administration -> Workspace ![]() |
| instance | Y | You can view the JDBC connection string in Lakehouse Studio under Administration -> Workspace ![]() |
| workspace | Y | Workspace in use |
| vcluster | Y | Virtual Cluster in use |
| schema | Y | Name of the schema to access |
Develop with Python Code
Use pip to install the Python package dependencies for Lakehouse. Python version 3.10 or above is required:
Writing Python Code
Commit data import completion:


