Lakehouse File Batch Import/Export Guide (COPY INTO)
Overview
COPY INTO is a high-performance batch data import/export command provided by Singdata Lakehouse. It supports batch loading of CSV, Parquet, ORC, and other format files from Volumes into tables, as well as exporting table data to files. This guide categorizes usage by business scenario to help you quickly master efficient data transfer methods.
SQL Commands Covered
| Command | Purpose | Applicable Scenario |
|---|---|---|
COPY INTO table FROM VOLUME ... | Import files from Volume | Batch load CSV/Parquet/ORC |
COPY INTO USER VOLUME ... FROM table | Export table data to User Volume | Data backup, external system exchange |
COPY INTO VOLUME name ... FROM table | Export table data to external Volume | Export to OSS/COS/S3 |
load_history('table_name') | View load history | Audit import records and file status |
Prerequisites
The following examples use a simulated target table doc_copy_target:
The examples use a Volume named my_volume. Before running, replace my_volume with an actual Volume name. You can view available Volumes with SHOW VOLUMES.
Import CSV from Volume
Use COPY INTO to batch import CSV files from a Volume.
Import a specific file by name:
Common OPTIONS parameters:
header = 'true': Skip the first header rowsep = ',': Specify the column delimiter (default is comma)quote = '"': Specify the quote character
Import Parquet from Volume
Parquet is a columnar storage format whose import performance typically exceeds CSV.
Advantages:
- Automatic schema inference, no manual column mapping required
- High compression ratio, faster network transfer and parsing
Export Table Data to Volume
Export to User Volume:
Export to external Volume (requires an External Volume created in advance):
Applicable scenarios:
- Data backup to object storage
- Provide data files for downstream systems
- Machine learning feature export
Handle Import Errors
Use the ON_ERROR parameter to control the handling strategy when format errors are encountered.
Available strategies:
CONTINUE: Skip erroneous rows and continue importing the remaining data. Suitable for scenarios that tolerate partial errors and require maximum load completion.ABORT: Immediately terminate the entire import operation on error. Suitable for strict data quality requirements where any error requires manual intervention.
When ON_ERROR is specified, the command returns the import result for each file with the following columns:
| Column | Description |
|---|---|
file | Full path of the imported file |
status | Import status: SUCCESS or LOADED_FAILED |
rows_loaded | Number of successfully imported rows |
first_error | First error message (empty on success) |
View Load History
Use the load_history function to view file import records for a table.
Returned column descriptions:
| Column | Description |
|---|---|
file_path | Path of the imported file |
last_copy_time | Most recent import time |
file_size | File size (bytes) |
status | Import status |
first_error_message | First error message |
Clean Up Test Data
Important Notes
- Idempotency:
COPY INTOrecords imported file paths; repeated execution will not re-import the same files. To force a re-import, useCOPY OVERWRITE INTO. - Schema Matching: The column order in imported files must match the table definition, or use
MATCH_BY_COLUMN_NAME = 'CASE_SENSITIVE'to match by column name. - Large File Splitting: Split very large files into multiple smaller files (100MB–1GB) to improve parallel import performance.
- Permission Requirements: Executing
COPY INTOrequiresINSERTpermission on the target table andREADpermission on the Volume. - Export Path: Exporting to an existing path appends files by default and does not overwrite existing files. Use
COPY OVERWRITE INTOto overwrite.
