AZURE BLOB STORAGE
type: azureMicrosoft's cloud object storage. Integrates with Fabric, Synapse, and the Azure data ecosystem.
PREREQUISITES
SDK: azure-storage-blob — installed automatically by:
dvt sync
CONFIGURATION FIELDS
| FIELD | REQUIRED | DEFAULT | DESCRIPTION |
|---|---|---|---|
| type | yes | — | Must be `azure` |
| account | yes | — | Azure storage account name |
| container | yes | — | Blob container name |
| access_key | no | — | Storage account access key |
| format | no | parquet | Default file format: csv, parquet, json, jsonl, delta, avro |
PROFILES.YML EXAMPLE
my_project:
target: prod_sqlserver
outputs:
prod_sqlserver:
type: sqlserver
host: sql-server.internal.com
port: 1433
user: sa
password: "{{ env_var('MSSQL_PASSWORD') }}"
database: analytics
azure_lake:
type: azure
account: mystorageaccount
container: raw-data
access_key: "{{ env_var('AZURE_ACCESS_KEY') }}"
format: parquet # default format for writesSOURCES.YML EXAMPLE
sources:
- name: raw_azure
connection: azure_lake # must match profiles.yml output name
tables:
- name: transactions
- name: inventoryMODEL EXAMPLE — BUCKET AS SOURCE
Sling extracts files from the bucket into the DuckDB cache. Model SQL executes in DuckDB (Postgres-like dialect).
-- models/staging/stg_transactions.sql
-- Extraction: Azure Blob → DuckDB cache → SQL Server
-- Written in DuckDB SQL dialect
{{ config(materialized='table') }}
SELECT
txn_id,
amount,
txn_date,
store_id
FROM {{ source('raw_azure', 'transactions') }}MODEL EXAMPLE — BUCKET AS TARGET
Model executes on the default target. Sling streams the result to the bucket in the specified format. Use config(format='...') to override the default format set in profiles.yml.
-- models/archive/archive_inventory.sql
-- Materialize to Azure Blob
-- format overrides the default set in profiles.yml
{{ config(
materialized='table',
target='azure_lake',
format='delta'
) }}
SELECT *
FROM {{ ref('current_inventory') }}
-- Supported format values:
-- 'csv' — comma-separated values
-- 'parquet' — columnar (default, recommended)
-- 'json' — JSON objects
-- 'jsonl' — JSON Lines (one record per line)
-- 'delta' — Delta Lake (schema evolution, time travel)
-- 'avro' — Apache AvroFILE FORMAT CONFIGURATION
The output format is resolved in order of priority:
config(format='delta') in model file— highest priorityformat: in profiles.yml output— project defaultparquet— built-in default| FORMAT | VALUE | NOTES |
|---|---|---|
| Parquet | parquet | Columnar, compressed. Best for analytics. Default. |
| Delta Lake | delta | Schema evolution, time travel, ACID transactions. |
| CSV | csv | Universal compatibility. No schema enforcement. |
| JSON | json | Nested structures, human-readable. |
| JSON Lines | jsonl | One record per line. Streamable. |
| Avro | avro | Schema embedded, compact binary. Kafka-friendly. |
| IPC / Arrow | ipc | Arrow IPC format. Zero-copy reads. |
INCREMENTAL EXTRACTION
Incremental models work with cloud storage sources. DVT resolves the watermark from the target database, extracts only changed rows, and merges them in the DuckDB cache.
-- models/staging/stg_inventory.sql
-- Incremental extraction from Azure Blob
{{ config(
materialized='incremental',
unique_key='sku'
) }}
SELECT sku, warehouse_id, quantity, updated_at
FROM {{ source('raw_azure', 'inventory') }}
{% if is_incremental() %}
WHERE updated_at > (SELECT MAX(updated_at) FROM {{ this }})
{% endif %}NOTES
- ⚠Supports shared key auth — SAS tokens not yet supported
DEBUGGING
Use dvt debug to test bucket connectivity. DVT verifies read/write access for each cloud storage connection.
dvt debug