← All Bucket Storages

AZURE BLOB STORAGE

type: azure

Microsoft's cloud object storage. Integrates with Fabric, Synapse, and the Azure data ecosystem.

PREREQUISITES

SDK: azure-storage-blob — installed automatically by:

dvt sync

CONFIGURATION FIELDS

FIELDREQUIREDDEFAULTDESCRIPTION
typeyesMust be `azure`
accountyesAzure storage account name
containeryesBlob container name
access_keynoStorage account access key
formatnoparquetDefault file format: csv, parquet, json, jsonl, delta, avro

PROFILES.YML EXAMPLE

my_project:
  target: prod_sqlserver
  outputs:
    prod_sqlserver:
      type: sqlserver
      host: sql-server.internal.com
      port: 1433
      user: sa
      password: "{{ env_var('MSSQL_PASSWORD') }}"
      database: analytics

    azure_lake:
      type: azure
      account: mystorageaccount
      container: raw-data
      access_key: "{{ env_var('AZURE_ACCESS_KEY') }}"
      format: parquet                  # default format for writes

SOURCES.YML EXAMPLE

sources:
  - name: raw_azure
    connection: azure_lake           # must match profiles.yml output name
    tables:
      - name: transactions
      - name: inventory

MODEL EXAMPLE — BUCKET AS SOURCE

Sling extracts files from the bucket into the DuckDB cache. Model SQL executes in DuckDB (Postgres-like dialect).

-- models/staging/stg_transactions.sql
-- Extraction: Azure Blob → DuckDB cache → SQL Server
-- Written in DuckDB SQL dialect

{{ config(materialized='table') }}

SELECT
    txn_id,
    amount,
    txn_date,
    store_id
FROM {{ source('raw_azure', 'transactions') }}

MODEL EXAMPLE — BUCKET AS TARGET

Model executes on the default target. Sling streams the result to the bucket in the specified format. Use config(format='...') to override the default format set in profiles.yml.

-- models/archive/archive_inventory.sql
-- Materialize to Azure Blob
-- format overrides the default set in profiles.yml

{{ config(
    materialized='table',
    target='azure_lake',
    format='delta'
) }}

SELECT *
FROM {{ ref('current_inventory') }}

-- Supported format values:
--   'csv'      — comma-separated values
--   'parquet'  — columnar (default, recommended)
--   'json'     — JSON objects
--   'jsonl'    — JSON Lines (one record per line)
--   'delta'    — Delta Lake (schema evolution, time travel)
--   'avro'     — Apache Avro

FILE FORMAT CONFIGURATION

The output format is resolved in order of priority:

1config(format='delta') in model file— highest priority
2format: in profiles.yml output— project default
3parquet— built-in default
FORMATVALUENOTES
ParquetparquetColumnar, compressed. Best for analytics. Default.
Delta LakedeltaSchema evolution, time travel, ACID transactions.
CSVcsvUniversal compatibility. No schema enforcement.
JSONjsonNested structures, human-readable.
JSON LinesjsonlOne record per line. Streamable.
AvroavroSchema embedded, compact binary. Kafka-friendly.
IPC / ArrowipcArrow IPC format. Zero-copy reads.

INCREMENTAL EXTRACTION

Incremental models work with cloud storage sources. DVT resolves the watermark from the target database, extracts only changed rows, and merges them in the DuckDB cache.

-- models/staging/stg_inventory.sql
-- Incremental extraction from Azure Blob

{{ config(
    materialized='incremental',
    unique_key='sku'
) }}

SELECT sku, warehouse_id, quantity, updated_at
FROM {{ source('raw_azure', 'inventory') }}
{% if is_incremental() %}
WHERE updated_at > (SELECT MAX(updated_at) FROM {{ this }})
{% endif %}

NOTES

  • Supports shared key auth — SAS tokens not yet supported

DEBUGGING

Use dvt debug to test bucket connectivity. DVT verifies read/write access for each cloud storage connection.

dvt debug