Ecosystem Tool Connections

Singdata Lakehouse supports multiple integration methods including JDBC drivers, Python/Java SDKs, and Spark/Flink Connectors. All major SQL clients, BI tools, and ETL platforms are supported. Choose the approach that fits your use case.


I want to connect with a SQL client

Recommended: DBeaver or DataGrip — connect via the JDBC driver with support for SQL editing, schema browsing, and data export.

ToolDescriptionReference
DBeaverFree and open-source; the community edition is sufficient for everyday queries and data explorationDBeaver Connection Guide
DataGripFrom JetBrains; strong code completion and SQL analysisDataGrip Connection Guide
SQL Workbench/JLightweight; suitable when you only need basic SQL executionSQL Workbench/J Connection Guide

All of the above connect via the JDBC driver. Connection string format:

jdbc:clickzetta://<instance_name>.<region_id>.api.singdata.com/<workspace_name>?username=<user>&password=<pwd>&virtualCluster=default

See JDBC Driver for details.


I want to use a BI tool for data visualization

ToolDescriptionReference
FineBIA leading domestic BI platform; connects via JDBC and is well-suited for internal enterprise reportingFineBI Connection Guide
TableauConnects via JDBC; ideal for complex visualizations and exploratory analysisTableau Connection Guide
MetabaseOpen-source and easy to deploy; suitable for self-service analytics in small to mid-sized teamsMetabase Connection Guide
Apache SupersetOpen-source; supports SQLAlchemy connections; suitable for teams with operational capacitySuperset Connection Guide
RathOpen-source intelligent analytics tool with automatic insight generationRath Connection Guide
StreamlitPython data application framework; lets data science teams build apps quicklyStreamlit Connection Guide
ZeppelinNotebook-style interface; suitable for data exploration and reportingZeppelin Connection Guide

I want to use an ETL tool for data integration

ToolDescriptionReference
DataXOpen-sourced by Alibaba; suitable for offline batch data synchronization with simple configurationDataX Integration Guide
dbtData transformation tool; ideal for SQL modeling and data transformation inside Singdata Lakehousedbt Integration Guide
AirbyteOpen-source ELT platform with a rich connector library; suitable for aggregating data from multiple sourcesAirbyte Integration Guide

Choosing the right tool:

  • Syncing data from a single source → DataX
  • Data modeling and transformation inside Singdata Lakehouse → dbt
  • Connecting to multiple SaaS data sources (Salesforce, HubSpot, etc.) → Airbyte

I want to connect programmatically

MethodLanguageDescriptionReference
JDBC DriverJava / any JVM languageStandard JDBC interface; supports SQL queries and DMLJDBC Driver
Python SDKPythonPEP 249-compliant; supports SQL queries, bulk writes (bulkload), and real-time writesPython SDK
Java SDKJavaSupports bulk writes (BulkLoad) and real-time streaming writes (RealtimeStream)Java SDK Bulk Upload · Java SDK Real-time Upload

Choosing a write mode:

  • Offline bulk import (GB-scale or larger) → BulkLoad (Java SDK or Python SDK bulkload)
  • Real-time row-by-row writes (millisecond latency) → RealtimeStream (Java SDK) or Python SDK real-time upload
  • Standard SQL INSERT → JDBC

I want to process data with a compute engine

EngineDescriptionReference
Apache SparkRead and write Singdata Lakehouse tables via the Spark Connector; supports the DataFrame API and spark-sqlSpark Connector
Apache FlinkWrite to Singdata Lakehouse via the Flink Connector; supports CDC scenarios and append-only mode; sink tables only (write)Flink Connector

Two Flink Connector modes:

  • igs-dynamic-table: supports CDC (insert / update / delete); the target table must have a primary key
  • igs-dynamic-table-append-only: append only, no updates or deletes; the target table is a regular table

Other

ToolDescriptionReference
MindsDBMachine learning platform; run predictions directly on Singdata Lakehouse dataMindsDB Integration Guide

For tools not listed here, you can create a custom connection using the JDBC driver or SQLAlchemy, depending on what connection methods the tool supports.


Not sure which approach to use?

What is your use case? ├── Interactive SQL queries / data exploration │ ├── GUI client → DBeaver or DataGrip │ └── Command line → cz-cli ├── Data visualization / reporting │ ├── Internal enterprise reporting → FineBI │ ├── Exploratory analysis → Tableau / Metabase │ └── Custom applications → Streamlit / Superset ├── Data integration / ETL │ ├── Offline batch sync → DataX │ ├── SQL modeling and transformation → dbt │ └── Multiple SaaS data sources → Airbyte ├── Programmatic access │ ├── Java applications → JDBC Driver or Java SDK │ └── Python applications → Python SDK └── Compute engine ├── Batch processing / ML → Spark Connector └── Stream processing / CDC → Flink Connector