Overview Quickstart dbt Compatibility The Style Guide profiles.yml sources.yml Configuration Supported Databases

AZURE BLOB STORAGE

Microsoft's cloud object storage. Integrates with Fabric, Synapse, and the Azure data ecosystem.

PREREQUISITES

SDK: azure-storage-blob — installed automatically by:

dvt sync

CONFIGURATION FIELDS

FIELD	REQUIRED	DEFAULT	DESCRIPTION
type	yes	—	Must be `azure`
account	yes	—	Azure storage account name
container	yes	—	Blob container name
access_key	no	—	Storage account access key
format	no	parquet	Default file format: csv, parquet, json, jsonl, delta, avro

PROFILES.YML EXAMPLE

my_project:
  target: prod_sqlserver
  outputs:
    prod_sqlserver:
      type: sqlserver
      host: sql-server.internal.com
      port: 1433
      user: sa
      password: "{{ env_var('MSSQL_PASSWORD') }}"
      database: analytics

    azure_lake:
      type: azure
      account: mystorageaccount
      container: raw-data
      access_key: "{{ env_var('AZURE_ACCESS_KEY') }}"
      format: parquet                  # default format for writes

SOURCES.YML EXAMPLE

sources:
  - name: raw_azure
    meta:
      connection: azure_lake           # must match profiles.yml output name
    tables:
      - name: transactions
      - name: inventory

MODEL EXAMPLE — BUCKET AS SOURCE

Sling extracts files from the bucket into the DuckDB cache. Model SQL executes in DuckDB (Postgres-like dialect).

-- models/staging/stg_transactions.sql
-- Extraction: Azure Blob → DuckDB cache → SQL Server
-- Written in DuckDB SQL dialect

{{ config(materialized='f_table') }}

SELECT
    txn_id,
    amount,
    txn_date,
    store_id
FROM {{ source('raw_azure', 'transactions') }}

MODEL EXAMPLE — BUCKET AS TARGET

Model executes on the default target. Sling streams the result to the bucket in the specified format. Use config(format='...') to override the default format set in profiles.yml.

-- models/archive/archive_inventory.sql
-- Materialize to Azure Blob
-- format overrides the default set in profiles.yml

{{ config(
    materialized='f_table',
    target='azure_lake',
    format='delta'
) }}

SELECT *
FROM {{ ref('current_inventory') }}

-- Supported format values:
--   'csv'      — comma-separated values
--   'parquet'  — columnar (default, recommended)
--   'json'     — JSON objects
--   'jsonl'    — JSON Lines (one record per line)
--   'delta'    — Delta Lake (schema evolution, time travel)
--   'avro'     — Apache Avro

FILE FORMAT CONFIGURATION

The output format is resolved in order of priority:

1 in model file— highest priority

2 in profiles.yml output— project default

3— built-in default

FORMAT	VALUE	NOTES
Parquet	parquet	Columnar, compressed. Best for analytics. Default.
Delta Lake	delta	Schema evolution, time travel, ACID transactions.
CSV	csv	Universal compatibility. No schema enforcement.
JSON	json	Nested structures, human-readable.
JSON Lines	jsonl	One record per line. Streamable.
Avro	avro	Schema embedded, compact binary. Kafka-friendly.
IPC / Arrow	ipc	Arrow IPC format. Zero-copy reads.

INCREMENTAL EXTRACTION

Incremental models work with cloud storage sources. DVT resolves the watermark from the target database, extracts only changed rows, and merges them in the DuckDB cache.

-- models/staging/stg_inventory.sql
-- Incremental extraction from Azure Blob

{{ config(
    materialized='f_incremental',
    unique_key='sku'
) }}

SELECT sku, warehouse_id, quantity, updated_at
FROM {{ source('raw_azure', 'inventory') }}
{% if is_incremental() %}
WHERE updated_at > (SELECT MAX(updated_at) FROM {{ this }})
{% endif %}

NOTES

⚠Supports shared key auth — SAS tokens not yet supported

DEBUGGING

Use dvt debug to test bucket connectivity. DVT verifies read/write access for each cloud storage connection.

dvt debug