Databricks AI & Model Serving Pricing:
GPUs, Foundation Models, and Vector Search
Databricks AI pricing has three billing dimensions: Model Serving DBUs for endpoint compute, GPU instance time for training and serving, and per-token pricing for Foundation Model APIs. This is the area where Databricks pricing is evolving fastest and where third-party documentation is most sparse.
Databricks AI Pricing Overview
Databricks has expanded significantly into the AI platform space through Mosaic AI (acquired MosaicML), Foundation Model APIs, and integrated MLOps tooling. The pricing model for AI workloads differs from traditional data engineering compute in important ways.
Model Serving DBUs
Both CPU and GPU model serving endpoints charge $0.07/DBU (AWS). The key cost driver is the instance type: a T4 GPU instance consumes about 0.75 DBU/hr, while an A100 consumes 65+ DBU/hr. The low per-DBU rate is offset by the high DBU consumption of GPU instances.
Foundation Model APIs
Pay-per-token pricing for hosted models including Llama 3.3 and embedding models. Alternatively, provision dedicated GPU throughput for high-volume serving. Unity Catalog manages model governance at no additional charge on Premium tier.
GPU Compute Time
ML training workloads use standard Jobs Compute pricing ($0.15/DBU on AWS) but run on GPU instances that consume more DBUs per hour. Spot instance support provides significant savings for training jobs that can handle interruptions.
Foundation Model API Pricing
Per-token pricing for Databricks-hosted foundation models. These rates apply to the pay-per-token billing mode where you are charged only for tokens processed.
| Model | Type | Input (per 1M tokens) | Output (per 1M tokens) |
|---|---|---|---|
| Llama 3.3 70B | Open Source | $0.50 | $1.50 |
| Llama 3.1 8B | Open Source | $0.15 | $0.45 |
| BGE Large (Embeddings) | Embeddings | $0.10 | N/A |
When to Use Pay-Per-Token vs Provisioned Throughput
Pay-Per-Token
- Variable or low-volume workloads
- Under ~50M tokens/month
- Development and experimentation
- No minimum commitment
- Latency is acceptable (shared infrastructure)
Provisioned Throughput
- High-volume production serving
- Over ~100M tokens/month
- Predictable throughput requirements
- Billed per GPU-hour, not per token
- Dedicated capacity for low latency
GPU Instance DBU Rates
GPU instances for model serving and training. The per-DBU rate is low ($0.07/DBU), but GPU instances consume many more DBUs per hour than CPU instances, making the actual cost significantly higher.
| GPU | Cloud | Instance | DBU/hr | Platform/hr | Infra/hr |
|---|---|---|---|---|---|
| T4 | AWS | g4dn.xlarge | 0.8 | $0.05 | $0.53 |
| T4 | AZURE | Standard_NC4as_T4_v3 | 0.8 | $0.05 | $0.53 |
| A10G | AWS | g5.xlarge | 2.0 | $0.14 | $1.01 |
| V100 | AWS | p3.2xlarge | 6.5 | $0.46 | $3.06 |
| V100 | AZURE | Standard_NC6s_v3 | 5.0 | $0.35 | $3.06 |
| A100 (40GB) | AWS | p4d.24xlarge (per GPU) | 65.0 | $4.55 | $32.77 |
Monthly Cost Examples
Light Serving (T4 GPU)
1 endpoint, 8 hrs/day, 30 days
Platform: $12.60/mo
Infrastructure: $126/mo
Total: ~$139/mo
Production Serving (A10G)
2 endpoints, 24/7
Platform: $201/mo
Infrastructure: $1,449/mo
Total: ~$1,650/mo
Heavy Training (A100)
1 instance, 100 hrs/month (spot)
Platform: $455/mo
Infrastructure (spot): ~$983/mo
Total: ~$1,438/mo
Vector Search Pricing
Databricks Vector Search provides a managed vector database for similarity search, integrated with Unity Catalog. Pricing has two components: storage for the vector index and compute for the serving endpoints.
Endpoint Pricing
Storage: $0.023/GB/month
vs Standalone Vector Databases
ML Training Cost Estimation
Training ML models on Databricks uses Jobs Compute pricing ($0.15/DBU on AWS Premium) with GPU-enabled instance types. The primary cost driver is the GPU type and training duration. Spot instances are strongly recommended for training workloads because most modern training frameworks support checkpointing, allowing jobs to resume after spot interruptions.
| Training Scenario | Duration | On-Demand Cost | Spot Cost |
|---|---|---|---|
| Fine-tune small model (T4, 1 GPU) | 4 hrs | $3 | $1 |
| Fine-tune medium model (A10G, 1 GPU) | 8 hrs | $10 | $3 |
| Train custom model (V100, 4 GPUs) | 24 hrs | $92 | $28 |
| Large model training (A100, 8 GPUs) | 72 hrs | $2,950 | $885 |
| Foundation model pre-training (A100 cluster) | 720 hrs | $29,500 | $8,850 |
Estimates include both Databricks platform (DBU) and cloud infrastructure costs on AWS. Spot pricing assumes 70% discount. Actual costs vary by instance availability, region, and training efficiency.
Frequently Asked Questions
How much does Databricks model serving cost?
Databricks model serving uses a DBU-based pricing model at $0.07/DBU for both CPU and GPU serving. However, GPU instances consume far more DBUs per hour than CPU instances. A T4 GPU instance consumes about 0.75 DBU/hr ($0.053/hr in platform fees), while an A100 instance consumes 65+ DBU/hr ($4.55/hr in platform fees). Total serving cost includes both the DBU charge and the cloud infrastructure charge for the underlying GPU instances.
What is the difference between pay-per-token and provisioned throughput?
Pay-per-token charges you per million input and output tokens with no commitment. This is ideal for variable or low-volume workloads. Provisioned throughput gives you dedicated GPU capacity billed by the hour, which is cheaper per token at high volumes but requires a minimum commitment. The break-even depends on your token volume, but provisioned throughput typically becomes cost-effective above roughly 50-100M tokens per month.
How does Databricks AI pricing compare to AWS SageMaker?
For model serving, Databricks and SageMaker are roughly comparable on a per-hour basis for similar GPU instances. Databricks has an advantage for teams already running data engineering on the platform because there is no data movement cost. SageMaker has a broader selection of built-in algorithms and managed training jobs. For foundation model APIs, pricing depends on the specific model and throughput requirements.
What does Databricks Vector Search cost?
Vector Search has two cost components: storage at approximately $0.023 per GB per month (same as underlying cloud storage), and endpoint compute at $0.28 to $1.28 per hour depending on endpoint size. For comparison, Pinecone Serverless starts at $0.33 per million read units, Weaviate Cloud starts at $25/month, and self-hosted options like Qdrant have only infrastructure costs.
Can I use spot instances for ML training on Databricks?
Yes, and this is one of the biggest cost optimization levers for ML workloads. Training jobs are typically fault-tolerant (checkpointing allows resumption after interruption), making them ideal for spot instances. Using spot instances for GPU training can save 60-80% on the cloud infrastructure portion of the bill. The Databricks DBU rate remains the same regardless of spot vs on-demand instances.