Overview Quickstart dbt Compatibility The Style Guide profiles.yml sources.yml Configuration Supported Databases

.SQL MODELS

SQL models are the foundation of DVT. A .sql model is a SQL file in your models/ folder that defines a transformation. DVT compiles the Jinja, executes the SQL, and materializes the result as a table or view in your target database.

DVT is backward-compatible with dbt. If you already know dbt, SQL models in DVT work exactly the same way — with the added ability to query across different database engines.

YOUR FIRST SQL MODEL

Create a .sql file in your models/ folder. Each file becomes one model (one table or view in your database):

-- models/stg_customers.sql

SELECT
    id,
    name,
    email,
    created_at
FROM {{ ref('raw_customers') }}
WHERE email IS NOT NULL

Run it with dvt run --select stg_customers. DVT will create a view (by default) named stg_customers in your target database.

CONFIGURATION

Use the {{ config() }}block at the top of your model to control how it's materialized:

-- models/dim_customers.sql

{{ config(materialized='table') }}

SELECT
    id,
    name,
    email,
    DATE(created_at) as signup_date
FROM {{ ref('stg_customers') }}

Materialization Options

TYPE	WHAT IT DOES	WHEN TO USE
view	Creates a database view (default)	Lightweight transformations, staging layers
table	Creates a physical table (DROP + CREATE)	Heavy transformations, frequently queried tables
incremental	Appends or merges new rows only	Large tables, event data, append-only logs
ephemeral	Injected as CTE into downstream queries	Reusable logic that doesn't need its own table

All Config Options

{{ config(
    materialized='f_table',            -- + view, table, incremental, ephemeral,
                                       --   f_table, f_incremental (federated)
    target='snowflake_prod',           -- where the result lands (federated models only)
    schema='analytics',                -- override default schema
    tags=['staging', 'daily'],         -- for selection: dvt run --select tag:staging
    pre_hook="DELETE FROM ...",        -- SQL to run before this model
    post_hook="GRANT SELECT ON ...",   -- SQL to run after this model
    unique_key='id',                   -- for incremental: merge key
    incremental_strategy='merge',      -- merge, append, delete+insert
) }}

REFERENCING OTHER MODELS WITH REF()

Use {{ ref('model_name') }} to reference another model. DVT resolves it to the correct table name and tracks the dependency in the DAG:

-- models/fct_orders.sql

{{ config(materialized='table') }}

SELECT
    o.order_id,
    o.order_date,
    o.amount,
    c.customer_name,
    c.segment
FROM {{ ref('stg_orders') }} o
JOIN {{ ref('dim_customers') }} c
    ON o.customer_id = c.id

Key point: ref() works with both SQL and Python models. If dim_customersis a Python model, this SQL model can still reference it — but note the downstream model then runs as a federated model in DuckDB dialect (DVT coerces it, with a warning), since dbt's native phase can't see the Python model's output. Declare it f_table to make that explicit.

REFERENCING SOURCES WITH SOURCE()

Use {{ source('source_name', 'table_name') }} to reference raw data defined in sources.yml:

-- models/stg_orders.sql

{{ config(materialized='view') }}

SELECT
    id as order_id,
    customer_id,
    CAST(amount AS DECIMAL(10,2)) as amount,
    created_at as order_date
FROM {{ source('crm', 'orders') }}

The behavior depends on whether the source has a connection: property in sources.yml:

SOURCE CONFIG	EXECUTION PATH	SQL DIALECT
`connection:` names the default target (or is missing — don't)	Pushdown — SQL runs directly on target database	Target's native SQL
`connection:` names any other output	Extraction — data pulled via Sling into DuckDB	DuckDB SQL

TWO SQL DIALECTS

This is the most important concept in DVT: a single project can contain SQL in two different dialects, depending on the execution path.

Pushdown Models (target dialect)

When all sources are on the default target, SQL runs directly on the database. Write in the target's native SQL:

-- Pushdown to PostgreSQL — use PostgreSQL syntax
{{ config(materialized='table') }}

SELECT
    id,
    name,
    created_at::date as signup_date,          -- PG cast syntax
    COALESCE(email, 'unknown') as email
FROM {{ ref('raw_customers') }}
WHERE created_at > NOW() - INTERVAL '30 days' -- PG interval syntax

Extraction Models (DuckDB dialect)

When sources are on different engines (cross-engine query), DVT extracts data into DuckDB and runs the SQL there. Write in DuckDB SQL (Postgres-like):

-- Federated: MySQL source → DuckDB → PostgreSQL target
{{ config(materialized='f_table', target='pg_docker') }}

SELECT
    m.id,
    m.name,
    m.amount,
    'mysql_to_pg' as federation_path
FROM {{ source('mysql_source', 'orders') }} m   -- lives on MySQL
WHERE m.amount > 20                              -- DuckDB SQL syntax

Rule of thumb: federated models (f_table / f_incremental) are written in DuckDB SQL; standard models are written in the default target's native SQL.

CROSS-ENGINE FEDERATION

DVT's superpower: join data from different databases in a single SQL query.

-- models/federation/cross_engine_report.sql
-- Joins MySQL orders with Oracle inventory — DVT handles everything

{{ config(materialized='f_table') }}

SELECT
    o.order_id,
    o.product_id,
    o.quantity,
    i.warehouse_location,
    i.stock_level,
    CASE
        WHEN i.stock_level < o.quantity THEN 'backorder'
        ELSE 'in_stock'
    END as fulfillment_status
FROM {{ source('mysql_ops', 'orders') }} o         -- MySQL
JOIN {{ source('oracle_erp', 'inventory') }} i     -- Oracle
    ON o.product_id = i.product_id

Behind the scenes, DVT:

Extracts orders from MySQL via Sling

Sling streams the data into the local DuckDB cache

Extracts inventory from Oracle via Sling

Same process, stored as a separate table in DuckDB cache

Runs the JOIN in DuckDB

DuckDB executes your SQL locally against both cached tables

Loads the result to your target

Sling streams the result from DuckDB into PostgreSQL (or wherever your target is)

INCREMENTAL MODELS

Incremental models only process new or changed rows, making them efficient for large datasets:

-- models/fct_events.sql

{{
    config(
        materialized='incremental',
        unique_key='event_id',
        incremental_strategy='merge',
    )
}}

SELECT
    event_id,
    user_id,
    event_type,
    event_timestamp,
    properties
FROM {{ source('analytics', 'raw_events') }}

{% if is_incremental() %}
    WHERE event_timestamp > (SELECT MAX(event_timestamp) FROM {{ this }})
{% endif %}

Incremental Strategies

STRATEGY	BEHAVIOR	USE WHEN
append	INSERT new rows only	Event logs, immutable data
merge	UPSERT — insert new, update existing	Slowly changing dimensions
delete+insert	DELETE matching rows, then INSERT	Partitioned data, corrected records

TARGETING DIFFERENT DATABASES

Federated models accept a target config to send their output to any profiles.yml output (standard dbt models are refused a non-default target — dbt would silently ignore it):

-- models/federation/mysql_to_oracle.sql
-- Reads from MySQL, writes to Oracle

{{
    config(
        materialized='f_table',      -- federated: only f_* models may set target=
        target='oracle_prod',        -- writes to Oracle (from profiles.yml)
    )
}}

SELECT
    id,
    name,
    amount
FROM {{ source('mysql_ops', 'transactions') }}

EPHEMERAL MODELS

Ephemeral models don't create tables — they're injected as CTEs into downstream queries:

-- models/staging/stg_active_users.sql
{{ config(materialized='ephemeral') }}

SELECT id, name, email
FROM {{ ref('raw_users') }}
WHERE is_active = true

-- models/report.sql
-- The ephemeral model becomes a CTE here:
SELECT * FROM {{ ref('stg_active_users') }}

JINJA TEMPLATING

SQL models support Jinja for dynamic SQL generation:

Variables

-- Use variables defined in dbt_project.yml or CLI
SELECT *
FROM {{ ref('orders') }}
WHERE order_date > '{{ var("start_date", "2024-01-01") }}'

Conditional Logic

SELECT
    id,
    name,
    {% if target.type == 'postgres' %}
        created_at::date as signup_date
    {% elif target.type == 'snowflake' %}
        DATE(created_at) as signup_date
    {% endif %}

Loops

{% set categories = ['electronics', 'clothing', 'food'] %}

SELECT
    order_id,
    {% for cat in categories %}
        SUM(CASE WHEN category = '{{ cat }}' THEN amount ELSE 0 END) as {{ cat }}_total
        {% if not loop.last %},{% endif %}
    {% endfor %}
FROM {{ ref('orders') }}
GROUP BY order_id

COMMON PATTERNS

Staging Layer

-- models/staging/stg_orders.sql
-- Clean and standardize raw data

{{ config(materialized='view') }}

SELECT
    id as order_id,
    TRIM(customer_name) as customer_name,
    CAST(amount AS DECIMAL(10,2)) as amount,
    LOWER(status) as status,
    created_at as order_date
FROM {{ source('raw', 'orders') }}
WHERE status != 'cancelled'

Fact Table

-- models/marts/fct_revenue.sql
-- Business logic, joins multiple staging tables

{{ config(materialized='table') }}

SELECT
    o.order_id,
    o.order_date,
    o.amount,
    c.customer_segment,
    p.product_category,
    o.amount * COALESCE(d.discount_rate, 0) as discount_amount,
    o.amount - (o.amount * COALESCE(d.discount_rate, 0)) as net_revenue
FROM {{ ref('stg_orders') }} o
LEFT JOIN {{ ref('dim_customers') }} c ON o.customer_id = c.id
LEFT JOIN {{ ref('dim_products') }} p ON o.product_id = p.id
LEFT JOIN {{ ref('stg_discounts') }} d ON o.discount_code = d.code

RUNNING SQL MODELS

# Run a specific model
dvt run --select stg_customers

# Run all models in a folder
dvt run --select staging.*

# Run a model and all its upstream dependencies
dvt run --select +fct_revenue

# Run a model and everything downstream
dvt run --select stg_customers+

# Run with full refresh (recreate tables from scratch)
dvt run --full-refresh

# Run models with a specific tag
dvt run --select tag:daily

SQL MODELS VS PYTHON MODELS

FEATURE	.SQL MODELS	.PY MODELS
File extension	.sql	.py
Execution	On database or DuckDB	Locally on your machine
Materializations	view, table, incremental, ephemeral, f_table, f_incremental	table (only)
Jinja	Full Jinja support	No Jinja (use dbt.ref() etc.)
Best for	Database transformations, joins, aggregations	API calls, ML, file processing, complex logic
Can ref each other?	Yes — in both directions	Yes — in both directions