💡 If your goal with DataX is to batch-sync data into Singdata Lakehouse, Singdata Studio provides a visual offline sync solution that requires no JSON configuration files:
If you need real-time rather than batch, DataX does not support CDC — use a real-time sync task instead
If you already have DataX jobs or need DataX-specific transformation capabilities, continue reading the integration guide below.
DataX Introduction
DataX is an open-source data synchronization tool by Alibaba, supporting multiple data sources including relational databases, HDFS, Hive, MaxCompute, HBase, FTP, and local files. This document will introduce how to use the DataX ClickZettaWriter plugin to synchronize DataX data to Singdata Lakehouse.
Usage Restrictions
vector and json types are not supported
Preparations
Please ensure that DataX is installed. For specific installation methods, please refer to the DataX User Guide.
Download the DataX ClickZettaWriter plugin from the following address: DataX ClickzettaWriter Plugin. Unzip the plugin into the plugin/writer directory under the DataX installation directory.
Before using the DataX ClickZettaWriter plugin, please ensure that the corresponding table has been created in Singdata Lakehouse.
Using the DataX ClickZettaWriter Plugin
1. Create Configuration File
The following example demonstrates how to use the DataX ClickZettaWriter plugin to synchronize MySQL data to Singdata Lakehouse.
mysqlreader: The built-in mysqlreader plugin in DataX, used for reading MySQL data. For specific usage, please refer to the mysqlreader plugin documentation.
clickzettawriter parameter instructions:
jdbcUrl: LakeHouse JDBC connection information.
table: The name of the table to write to (only supports writing to one table).
column: The names of the columns to write to (* asterisk indicates all columns).
partitionColumns: The names of the partition columns, used for partitioned table writing (the columns specified in column plus the partition columns must be all columns of the table).
writeMode: The write mode, optional values are append, overwrite, and upsert, default is append.
username: LakeHouse username.
password: LakeHouse password.
preSql: SQL statements to be executed before writing.
postSql: SQL statements to be executed after writing.
2. Execute the Synchronization Task
Run the following command to execute the synchronization task:
python bin/datax.py job.json
Usage Example
Example 1: Sync MySQL Data to Singdata Lakehouse
The following configuration file example synchronizes the test_table data in MySQL to the example_table in Singdata Lakehouse.