Databricks Account Onboarding Overview
Welcome to the CloudYali Databricks Account Onboarding Guide! This document provides an overview of integrating your Databricks workspace with CloudYali for comprehensive cost tracking and optimization.
CloudYali connects to Databricks using OAuth Machine-to-Machine (M2M) authentication via a Service Principal to securely access your billing and usage data through Databricks System Tables. This read-only integration enables cost tracking, cluster attribution, and optimization insights without accessing any notebook or job content.
Ready to connect? Jump directly to the step-by-step setup guide.
Overview of Integration Features
Cost Tracking and Attribution
Track your Databricks spending with detailed breakdowns:
- Daily cost reports showing DBU spending trends over time
- SKU-level cost analysis across All-Purpose Compute, Jobs Compute, SQL Warehouse, and other SKUs
- Usage quantity breakdown including DBU consumption per cluster and job
- Workspace attribution for accurate cost allocation across teams
Multi-Workspace Support
For organizations using multiple Databricks workspaces:
- Track costs across all workspaces in a unified dashboard
- Allocate expenses to specific teams or projects
- Compare usage patterns between workspaces
Total Cost of Ownership (TCO)
Get a complete view of your Databricks spending:
- DBU costs from Databricks billing (compute, SQL, serverless)
- AWS cloud infrastructure costs (EC2 instances, EBS storage, networking) correlated via cluster tags
- Per-cluster and per-job TCO combining Databricks DBU costs and AWS infrastructure costs in one view
Resource Inventory
Track all your Databricks resources in one place:
- Clusters — all-purpose and job cluster configurations, state, and lifecycle
- SQL Warehouses — warehouse type, sizing, scaling settings, and auto-stop configuration
- Jobs — job definitions, owners, schedules, and tags for cost attribution
Unified Multi-Cloud View
View Databricks costs alongside your AWS, GCP, Azure, and Anthropic spending in a single dashboard for complete cloud cost visibility.
Who Should Use This Integration?
- Data engineering teams running ETL/ELT pipelines on Databricks
- Data science teams using interactive notebooks and ML training jobs
- Finance teams needing accurate cost attribution and reporting across Databricks workspaces
- Platform teams managing Databricks infrastructure costs and optimizing cluster configurations
To use this integration, you need:
- A Databricks workspace with Unity Catalog enabled (for System Tables access)
- Workspace Admin or Account Admin permissions to create a Service Principal
- A Serverless SQL Warehouse (or any SQL Warehouse) for querying System Tables
How It Works
CloudYali integrates with Databricks through three components:
| Component | Purpose | Customer Cost |
|---|---|---|
| Service Principal | OAuth M2M authentication for secure, token-based access | Free |
| SQL Warehouse | Executes read-only queries against System Tables | ~$5/month (Serverless) |
| System Tables | Databricks-managed tables containing billing, cluster, and pricing data | Free (included with Unity Catalog) |
Data Flow
- CloudYali authenticates using your Service Principal's OAuth credentials
- Read-only SQL queries run against
system.billing.usage,system.billing.list_prices, andsystem.compute.clusters - Billing data is synced every 6-12 hours with incremental updates
- Cost data appears in CloudYali's unified dashboard alongside other cloud providers
Security Considerations
CloudYali never accesses your notebooks, job code, data, or query results. Only billing metadata and cluster configuration data is collected.
- Read-Only Access: CloudYali queries only System Tables (billing, pricing, cluster metadata). No access to your data, notebooks, or job outputs.
- OAuth M2M Authentication: Industry-standard OAuth 2.0 client credentials flow with 1-hour token expiry and automatic refresh.
- Encrypted Storage: Your Service Principal credentials (Client ID and Client Secret) are encrypted at rest using AES-256 and stored in AWS Secrets Manager.
- Minimal Permissions: The Service Principal requires only
SELECTaccess on System Tables — no write permissions, no cluster management, no data access.
Getting Started
Ready to connect your Databricks workspace? Follow the detailed setup guide:
Connect Your Databricks Workspace →
Next Steps
Once your Databricks workspace is connected:
- Budgets & Alerts — Set up spending thresholds and notifications
- Cost Reports — Create custom reports for your Databricks usage
For additional help, please contact our support team at support@cloudyali.io.