Overview
Singdata Lakehouse is a next-generation cloud lakehouse independently developed by Singdata. Built on an incremental computing engine, it delivers up to 10x performance improvement over traditional open-source architectures like Spark, enabling full-chain, low-cost, real-time processing of massive data. The platform supports the integration, storage, and computation of all data types, providing solid data infrastructure for AI innovation and helping enterprises upgrade from traditional Spark systems to the AI era.
For enterprises with existing data lakes (OSS / S3 / COS), Singdata Lakehouse can directly mount existing object storage and federation-query data in Hive, Iceberg, Delta Lake, and other formats via External Catalog — no data migration required to gain high-performance SQL analytics. This is the lowest-cost path from a data lake to a unified lakehouse.
Supports seven global clouds, already available in multiple Asia-Pacific regions, with private deployment also supported. Infrastructure costs reduced to 1/5–1/3 of traditional solutions, with near-zero operational overhead.
Migration Guide · Migration Best Practices · SQL Syntax Comparison · Spark Connector · Performance Benchmarks
On-Site Acceleration Implementation Guide · External Catalog Federation Query · External Tables · Object Storage Mount · Performance Benchmarks
Lakehouse AI Overview · AI Data Preparation · Vector Search · AI Gateway · Data Analytics Agent · Data Engineering Agent
Supported Cloud Platforms and Regions
First Time Here?
5 minutes
Register an account, activate a service instance, and complete initial setup
Get Started →
30 minutes
Walk through data ingestion, SQL querying, and Dynamic Table incremental computing
Start Exploring →
On demand
Dedicated paths for data engineers, analysts, AI engineers, and administrators
Choose Your Path →
Who Are You and What Do You Want to Do?
| Role / Scenario | Recommended Starting Point |
|---|---|
| Data Integration / Data Sync Data ingestion, CDC sync, file import, streaming writes | Studio Data Integration (visual config for 40+ data sources) · Real-Time Sync Tasks (MySQL / PG / Oracle full-database CDC) · Offline Sync Tasks (scheduled batch sync) · Pipe Continuous Ingestion (object storage / Kafka auto-write) · COPY INTO (one-time file import) · Complete Data Ingestion Guide |
| Data Engineer Build data pipelines, ETL processing, manage data warehouse layers | Dynamic Table Incremental Computing · Dynamic Table Overview · Real-Time Data Pipeline · Studio Task Development & Scheduling · DDL Syntax Reference · SQL Reference · cz-cli Command Line Tool · Data Engineering Agent · TPC-DS Benchmark |
| Data Analyst SQL queries, BI connections, ad-hoc analysis | Run Your First SQL Query · Connect BI Tools · Data Analytics Agent (natural language queries) · Semantic Views · SSB Benchmark · TPC-H Benchmark |
| AI / ML Engineer Vector search, RAG, AI functions, model invocation | AI Data Preparation · Vector Search · AI Functions (AI_COMPLETE / AI_EMBEDDING) · AI Gateway · Python SDK · ZettaPark (DataFrame API) |
| Platform Administrator User management, permissions, compute clusters, cost control | Account and Service Instance Setup · User and Permission Management · Compute Cluster Management |
| AI Agent / Automation Deterministic API calls, semantic layer queries, automated data pipelines | cz-cli Command Line Tool (deterministic interface, suitable for Agent calls) · Semantic Views (business semantic layer) · Python SDK · ZettaPark · Data Analytics Agent · Data Engineering Agent · Singclaw |
Core Capabilities
Data Ingestion
40+ data sources ready out of the box: MySQL / PG / Oracle full-database CDC real-time sync, Kafka streaming writes, OSS / S3 / COS continuous file ingestion, COPY INTO one-time batch import.
Data Ingestion Guide · Studio Data Integration · Pipe · COPY INTO
Unified Lakehouse
No migration needed for existing data lakes (OSS / S3 / COS). Mount existing object storage directly and federation-query Hive, Iceberg, and Delta Lake format data via External Catalog to gain high-performance SQL analytics.
External Catalog · External Volume · On-Site Acceleration Guide
Incremental Computing
Define transformation logic in standard SQL. Dynamic Table automatically detects upstream changes and refreshes incrementally, replacing manual scheduling scripts to build low-latency data pipelines.
Incremental Computing · Dynamic Table Overview · Real-Time Data Pipeline
High-Performance SQL Analytics
Vectorized execution engine. Industry-leading performance on TPC-DS / TPC-H / SSB benchmarks. Supports OLAP multi-dimensional analysis and ad-hoc queries, up to 10x faster than traditional Spark architectures.
AI Native
Vector indexing, full-text search, AI functions (AI_COMPLETE / AI_EMBEDDING), and semantic views are built into the data platform. Build RAG knowledge bases and AI-enhanced analytics without external services. Data Analytics Agent supports natural language conversational data querying; Data Engineering Agent supports natural language ETL development.
Lakehouse AI Overview · Vector Search · AI Functions · Semantic Views · Data Analytics Agent · Data Engineering Agent
Studio & AI Agent Integration
Built-in IDE, task scheduling, data integration, data quality, and operational monitoring — a one-stop data development platform. cz-cli provides a deterministic command interface; semantic views provide a business semantic layer, enabling AI Agents to call data capabilities directly.
Studio User Manual · cz-cli Installation and Usage · Semantic Views
What's New
In This Section
| Page | Description |
|---|---|
| Before You Begin | Ways to access Lakehouse: Studio, CLI, drivers and connectors |
| Account Signup and Setup | Register an account, activate a service instance, and complete initialization |
| Cloud Services and Regions | Supported cloud providers and available regions |
