AI Data Preparation
Singdata Lakehouse unifies vector search, full-text search, and structured data analysis on a single platform, letting AI applications complete retrieval and computation directly where the data lives — no need to move data to an external vector database or search engine.
Selection Guide
| What you need to do | Recommended approach |
|---|---|
| Semantic similarity search, RAG retrieval, image search | Vector Search |
| Keyword search, log retrieval, Chinese tokenized search | Full-Text Search |
| Vector + keyword hybrid search to improve recall quality | Hybrid Search (RRF) |
| Vector search + structured filtering (e.g., time range, category tags) on the same table | Multi-modal Data Retrieval |
Core Capabilities
Vector Search
Create a vector index (HNSW) on a table to support approximate nearest neighbor (ANN) retrieval. Suitable for semantic search, knowledge base Q&A, image similarity, and similar scenarios.
Full-Text Search
Based on an inverted index, supports Chinese and English tokenization, BM25 relevance ranking, and phrase matching. Suitable for document search, log retrieval, comment analysis, and similar scenarios.
→ Full Full-Text Search Guide · BM25 Parameter Tuning
Hybrid Search (RRF)
Merges vector search and full-text search results using Reciprocal Rank Fusion, balancing semantic relevance and exact keyword matching. Recall quality is better than either approach alone.
→ Hybrid Search Best Practices
Multi-modal Data Retrieval
Build both a vector index and an inverted index on the same table, supporting combined filtering of vector similarity and structured conditions (time, category, tags) without cross-table JOINs.
→ Multi-modal Data Retrieval Guide
Typical Scenarios
RAG Knowledge Base Q&A: Ingest documents → vectorize with AI_EMBEDDING → vector index → ANN retrieval on user query → generate answer with AI_COMPLETE
Enterprise Search: Chinese tokenized inverted index + vector index hybrid search, balancing exact matching and semantic understanding
Recommendation Systems: Vectorize user behavior → ANN retrieval of similar users or items
Image Retrieval: Store image feature vectors in a VECTOR column → ANN search for similar images
Related Documentation
| Document | Description |
|---|---|
| Vector Search | Vector index creation, ANN search, distance functions |
| Full-Text Search | Inverted index, tokenizers, MATCH queries |
| Hybrid Search Best Practices | Complete RRF fusion ranking example |
| Multi-modal Data Retrieval | Vector + structured filtering combination |
| AI Functions | Built-in SQL AI functions such as AI_EMBEDDING and AI_COMPLETE |
| Vector Index | Vector index DDL syntax reference |
| Inverted Index | Inverted index DDL syntax reference |
