AI Gateway Pricing

This page describes the pricing model and list prices of Singdata AI Gateway in overseas regions (USD). For Lakehouse compute, storage, and network resources, see Lakehouse Pricing. The pricing entry point is Pricing and Billing.

Overview

Singdata AI Gateway is a one-stop platform that aggregates and manages mainstream LLMs (Anthropic Claude, OpenAI GPT, Qwen, DeepSeek, GLM, Kimi, MiniMax, and more) behind a unified API, so you do not have to register, integrate, and fund a separate account on each vendor's platform.

💡 Tip: When Analytics Agent calls an LLM, the resulting token consumption is billed at the per-model unit prices listed in this document.

Billing Modes

Pay-as-you-go

Each API call is billed by the number of tokens it actually consumes, with no minimum spend. Metering is captured at the individual call level and can be aggregated by API key, application, or tenant. Bills are issued monthly, and itemized usage is available in the Billing Center of the console.

Billing Dimensions

Different model types are billed along different dimensions. The fields used in the price tables below have the following meanings.

Chat Models Billed by Token

Input and output are priced separately. One token is roughly 0.5 Chinese characters or 0.75 English words.

Field	Meaning
Input	Unit price for tokens in the prompt portion of the request
Output	Unit price for tokens generated by the model
Context Window	Tiered pricing across context-window ranges; tokens are billed at the unit price of the tier their request falls into

💡 Tip: For example, a Q&A with 1,000 characters of input and 2,000 characters of output is roughly 1,000 input tokens and 2,000 output tokens. With an input price of $2 per million tokens and an output price of $8 per million tokens, the call costs about 1000÷1,000,000×2 + 2000÷1,000,000×8 = $0.018.

Caching and Cost Savings

If you repeatedly send the same leading content (a long system prompt, a fixed knowledge base document, and so on), the model can keep that content available for reuse on subsequent calls instead of recomputing it from scratch. That is what caching does. Tokens that hit the cache are priced well below the standard input price, which can substantially reduce cost for long prompts and multi-turn conversations.

There are two types of caching, mapped to different columns in the price tables.

Type	How It Works	Corresponding Columns in Price Table
Explicit Cache	You explicitly tell the model to store a segment. A one-time write fee applies on creation (slightly higher than the input price), and a much lower hit fee applies on each subsequent reuse	Explicit·Write, Explicit·Write·5min, Explicit·Write·1h, Explicit·Hit
Implicit Cache	The system automatically detects repeated prefixes and caches them, with no manual action required. There is no write fee; only the lower hit price applies on hits	Implicit·Hit

How each column is billed:

Explicit·Write: Charged when a prompt segment is first written to the cache, calculated as the number of tokens written multiplied by this unit price
Explicit·Write·5min / Explicit·Write·1h: Anthropic Claude's explicit cache offers two retention tiers. The 5-minute tier has a lower unit price; the 1-hour tier has a higher unit price but suits cases where the same content is reused repeatedly within an hour
Explicit·Hit: Charged when a subsequent request hits the cache, calculated as the number of hit tokens multiplied by this unit price, which is significantly lower than the input price
Implicit·Hit: When the system detects a repeated prefix in a request, the hit portion is billed at this unit price. Because the system writes to the cache automatically, no separate write fee applies

Vendors differ in which cache types they support. The matrix below summarizes current support:

Vendor	Explicit Cache	Implicit Cache
Anthropic Claude	Supported (5-minute and 1-hour write tiers)	Not supported
OpenAI GPT	Not supported	Supported
Qwen	Supported	Partially supported (3.5 series supported; 3.6 / 3.7 series not yet available)
DeepSeek	Partially supported (v3.2 supported)	Supported
GLM	Partially supported (5.1 supported)	Supported
Kimi	Supported	Supported
MiniMax	Not supported	Supported

How Multimodal Embeddings Are Billed

Embedding models are priced separately by input data type. Text input uses a single input price, while image and video inputs are billed by the number of multimodal tokens at a unit price higher than text input.

Overseas Model List Prices

⚠️ Note: The prices below are public list prices. Model market prices fluctuate, and list prices may change as the market changes. Actual settled prices follow the bill. Overseas list prices exclude tax. VAT will be charged separately according to local tax requirements (for example, in Singapore an additional 9% GST applies per local regulations).