Skip to main content

Databricks Account Onboarding Overview

Welcome to the CloudYali Databricks Account Onboarding Guide! This document provides an overview of integrating your Databricks workspace with CloudYali for comprehensive cost tracking and optimization.

CloudYali connects to Databricks using OAuth Machine-to-Machine (M2M) authentication via a Service Principal to securely access your billing and usage data through Databricks System Tables. This read-only integration enables cost tracking, cluster attribution, and optimization insights without accessing any notebook or job content.

Quick Start

Ready to connect? Jump directly to the step-by-step setup guide.


Overview of Integration Features

Cost Tracking and Attribution

Track your Databricks spending with detailed breakdowns:

  • Daily cost reports showing DBU spending trends over time
  • SKU-level cost analysis across All-Purpose Compute, Jobs Compute, SQL Warehouse, and other SKUs
  • Usage quantity breakdown including DBU consumption per cluster and job
  • Workspace attribution for accurate cost allocation across teams

Multi-Workspace Support

For organizations using multiple Databricks workspaces:

  • Track costs across all workspaces in a unified dashboard
  • Allocate expenses to specific teams or projects
  • Compare usage patterns between workspaces

Total Cost of Ownership (TCO)

Get a complete view of your Databricks spending:

  • DBU costs from Databricks billing (compute, SQL, serverless)
  • AWS cloud infrastructure costs (EC2 instances, EBS storage, networking) correlated via cluster tags
  • Per-cluster and per-job TCO combining Databricks DBU costs and AWS infrastructure costs in one view

Resource Inventory

Track all your Databricks resources in one place:

  • Clusters — all-purpose and job cluster configurations, state, and lifecycle
  • SQL Warehouses — warehouse type, sizing, scaling settings, and auto-stop configuration
  • Jobs — job definitions, owners, schedules, and tags for cost attribution

Unified Multi-Cloud View

View Databricks costs alongside your AWS, GCP, Azure, and Anthropic spending in a single dashboard for complete cloud cost visibility.


Who Should Use This Integration?

  • Data engineering teams running ETL/ELT pipelines on Databricks
  • Data science teams using interactive notebooks and ML training jobs
  • Finance teams needing accurate cost attribution and reporting across Databricks workspaces
  • Platform teams managing Databricks infrastructure costs and optimizing cluster configurations
Requirements

To use this integration, you need:

  • A Databricks workspace with Unity Catalog enabled (for System Tables access)
  • Workspace Admin or Account Admin permissions to create a Service Principal
  • A Serverless SQL Warehouse (or any SQL Warehouse) for querying System Tables

How It Works

CloudYali integrates with Databricks through three components:

ComponentPurposeCustomer Cost
Service PrincipalOAuth M2M authentication for secure, token-based accessFree
SQL WarehouseExecutes read-only queries against System Tables~$5/month (Serverless)
System TablesDatabricks-managed tables containing billing, cluster, and pricing dataFree (included with Unity Catalog)

Data Flow

  1. CloudYali authenticates using your Service Principal's OAuth credentials
  2. Read-only SQL queries run against system.billing.usage, system.billing.list_prices, and system.compute.clusters
  3. Billing data is synced every 6-12 hours with incremental updates
  4. Cost data appears in CloudYali's unified dashboard alongside other cloud providers

Security Considerations

Data Privacy

CloudYali never accesses your notebooks, job code, data, or query results. Only billing metadata and cluster configuration data is collected.

  • Read-Only Access: CloudYali queries only System Tables (billing, pricing, cluster metadata). No access to your data, notebooks, or job outputs.
  • OAuth M2M Authentication: Industry-standard OAuth 2.0 client credentials flow with 1-hour token expiry and automatic refresh.
  • Encrypted Storage: Your Service Principal credentials (Client ID and Client Secret) are encrypted at rest using AES-256 and stored in AWS Secrets Manager.
  • Minimal Permissions: The Service Principal requires only SELECT access on System Tables — no write permissions, no cluster management, no data access.

Getting Started

Ready to connect your Databricks workspace? Follow the detailed setup guide:

Connect Your Databricks Workspace →


Next Steps

Once your Databricks workspace is connected:


For additional help, please contact our support team at support@cloudyali.io.