Question 1

How much does Databricks model serving cost?

Accepted Answer

Databricks model serving uses a DBU-based pricing model at $0.07/DBU for both CPU and GPU serving. However, GPU instances consume far more DBUs per hour than CPU instances. A T4 GPU instance consumes about 0.75 DBU/hr ($0.053/hr in platform fees), while an A100 instance consumes 65+ DBU/hr ($4.55/hr in platform fees). Total serving cost includes both the DBU charge and the cloud infrastructure charge for the underlying GPU instances.

Question 2

What is the difference between pay-per-token and provisioned throughput?

Accepted Answer

Pay-per-token charges you per million input and output tokens with no commitment. This is ideal for variable or low-volume workloads. Provisioned throughput gives you dedicated GPU capacity billed by the hour, which is cheaper per token at high volumes but requires a minimum commitment. The break-even depends on your token volume, but provisioned throughput typically becomes cost-effective above roughly 50-100M tokens per month.

Question 3

How does Databricks AI pricing compare to AWS SageMaker?

Accepted Answer

For model serving, Databricks and SageMaker are roughly comparable on a per-hour basis for similar GPU instances. Databricks has an advantage for teams already running data engineering on the platform because there is no data movement cost. SageMaker has a broader selection of built-in algorithms and managed training jobs. For foundation model APIs, pricing depends on the specific model and throughput requirements.

Question 4

What does Databricks Vector Search cost?

Accepted Answer

Vector Search has two cost components: storage at approximately $0.023 per GB per month (same as underlying cloud storage), and endpoint compute at $0.28 to $1.28 per hour depending on endpoint size. For comparison, Pinecone Serverless starts at $0.33 per million read units, Weaviate Cloud starts at $25/month, and self-hosted options like Qdrant have only infrastructure costs.

Question 5

Can I use spot instances for ML training on Databricks?

Accepted Answer

Yes, and this is one of the biggest cost optimization levers for ML workloads. Training jobs are typically fault-tolerant (checkpointing allows resumption after interruption), making them ideal for spot instances. Using spot instances for GPU training can save 60-80% on the cloud infrastructure portion of the bill. The Databricks DBU rate remains the same regardless of spot vs on-demand instances.

Model	Type	Input (per 1M tokens)	Output (per 1M tokens)
Llama 3.3 70B	Open Source	$0.50	$1.50
Llama 3.1 8B	Open Source	$0.15	$0.45
BGE Large (Embeddings)	Embeddings	$0.10	N/A

GPU	Cloud	Instance	DBU/hr	Platform/hr	Infra/hr
T4	AWS	g4dn.xlarge	0.8	$0.05	$0.53
T4	AZURE	Standard_NC4as_T4_v3	0.8	$0.05	$0.53
A10G	AWS	g5.xlarge	2.0	$0.14	$1.01
V100	AWS	p3.2xlarge	6.5	$0.46	$3.06
V100	AZURE	Standard_NC6s_v3	5.0	$0.35	$3.06
A100 (40GB)	AWS	p4d.24xlarge (per GPU)	65.0	$4.55	$32.77

Training Scenario	Duration	On-Demand Cost	Spot Cost
Fine-tune small model (T4, 1 GPU)	4 hrs	$3	$1
Fine-tune medium model (A10G, 1 GPU)	8 hrs	$10	$3
Train custom model (V100, 4 GPUs)	24 hrs	$92	$28
Large model training (A100, 8 GPUs)	72 hrs	$2,950	$885
Foundation model pre-training (A100 cluster)	720 hrs	$29,500	$8,850

Databricks AI & Model Serving Pricing:
GPUs, Foundation Models, and Vector Search

Databricks AI Pricing Overview

Model Serving DBUs

Foundation Model APIs

GPU Compute Time

Foundation Model API Pricing

When to Use Pay-Per-Token vs Provisioned Throughput

Pay-Per-Token

Provisioned Throughput

GPU Instance DBU Rates

Monthly Cost Examples

Light Serving (T4 GPU)

Production Serving (A10G)

Heavy Training (A100)

Vector Search Pricing

Endpoint Pricing

vs Standalone Vector Databases

ML Training Cost Estimation

Frequently Asked Questions

How much does Databricks model serving cost?

What is the difference between pay-per-token and provisioned throughput?

How does Databricks AI pricing compare to AWS SageMaker?

What does Databricks Vector Search cost?

Can I use spot instances for ML training on Databricks?

Databricks AI & Model Serving Pricing:GPUs, Foundation Models, and Vector Search

Databricks AI Pricing Overview

Model Serving DBUs

Foundation Model APIs

GPU Compute Time

Foundation Model API Pricing

When to Use Pay-Per-Token vs Provisioned Throughput

Pay-Per-Token

Provisioned Throughput

GPU Instance DBU Rates

Monthly Cost Examples

Light Serving (T4 GPU)

Production Serving (A10G)

Heavy Training (A100)

Vector Search Pricing

Endpoint Pricing

vs Standalone Vector Databases

ML Training Cost Estimation

Frequently Asked Questions

How much does Databricks model serving cost?

What is the difference between pay-per-token and provisioned throughput?

How does Databricks AI pricing compare to AWS SageMaker?

What does Databricks Vector Search cost?

Can I use spot instances for ML training on Databricks?

Databricks AI & Model Serving Pricing:
GPUs, Foundation Models, and Vector Search