.SQL MODELS

SQL models are the foundation of DVT. A .sql model is a SQL file in your models/ folder that defines a transformation. DVT compiles the Jinja, executes the SQL, and materializes the result as a table or view in your target database.

DVT is backward-compatible with dbt. If you already know dbt, SQL models in DVT work exactly the same way — with the added ability to query across different database engines.

YOUR FIRST SQL MODEL

Create a .sql file in your models/ folder. Each file becomes one model (one table or view in your database):

-- models/stg_customers.sql

SELECT
    id,
    name,
    email,
    created_at
FROM {{ ref('raw_customers') }}
WHERE email IS NOT NULL

Run it with dvt run --select stg_customers. DVT will create a view (by default) named stg_customers in your target database.

CONFIGURATION

Use the {{ config() }}block at the top of your model to control how it's materialized:

-- models/dim_customers.sql

{{ config(materialized='table') }}

SELECT
    id,
    name,
    email,
    DATE(created_at) as signup_date
FROM {{ ref('stg_customers') }}

Materialization Options

TYPEWHAT IT DOESWHEN TO USE
viewCreates a database view (default)Lightweight transformations, staging layers
tableCreates a physical table (DROP + CREATE)Heavy transformations, frequently queried tables
incrementalAppends or merges new rows onlyLarge tables, event data, append-only logs
ephemeralInjected as CTE into downstream queriesReusable logic that doesn't need its own table

All Config Options

{{ config(
    materialized='table',              -- view, table, incremental, ephemeral
    target='snowflake_prod',           -- override default target (DVT-specific)
    schema='analytics',                -- override default schema
    tags=['staging', 'daily'],         -- for selection: dvt run --select tag:staging
    pre_hook="DELETE FROM ...",        -- SQL to run before this model
    post_hook="GRANT SELECT ON ...",   -- SQL to run after this model
    unique_key='id',                   -- for incremental: merge key
    incremental_strategy='merge',      -- merge, append, delete+insert
) }}

REFERENCING OTHER MODELS WITH REF()

Use {{ ref('model_name') }} to reference another model. DVT resolves it to the correct table name and tracks the dependency in the DAG:

-- models/fct_orders.sql

{{ config(materialized='table') }}

SELECT
    o.order_id,
    o.order_date,
    o.amount,
    c.customer_name,
    c.segment
FROM {{ ref('stg_orders') }} o
JOIN {{ ref('dim_customers') }} c
    ON o.customer_id = c.id

Key point: ref() works with both SQL and Python models. If dim_customersis a Python model, this SQL model can still reference it — the Python model's output table is just a regular table in your database.

REFERENCING SOURCES WITH SOURCE()

Use {{ source('source_name', 'table_name') }} to reference raw data defined in sources.yml:

-- models/stg_orders.sql

{{ config(materialized='view') }}

SELECT
    id as order_id,
    customer_id,
    CAST(amount AS DECIMAL(10,2)) as amount,
    created_at as order_date
FROM {{ source('crm', 'orders') }}

The behavior depends on whether the source has a connection: property in sources.yml:

SOURCE CONFIGEXECUTION PATHSQL DIALECT
No connection:Pushdown — SQL runs directly on target databaseTarget's native SQL
Has connection:Extraction — data pulled via Sling into DuckDBDuckDB SQL

TWO SQL DIALECTS

This is the most important concept in DVT: a single project can contain SQL in two different dialects, depending on the execution path.

Pushdown Models (target dialect)

When all sources are on the default target, SQL runs directly on the database. Write in the target's native SQL:

-- Pushdown to PostgreSQL — use PostgreSQL syntax
{{ config(materialized='table') }}

SELECT
    id,
    name,
    created_at::date as signup_date,          -- PG cast syntax
    COALESCE(email, 'unknown') as email
FROM {{ ref('raw_customers') }}
WHERE created_at > NOW() - INTERVAL '30 days' -- PG interval syntax

Extraction Models (DuckDB dialect)

When sources are on different engines (cross-engine query), DVT extracts data into DuckDB and runs the SQL there. Write in DuckDB SQL (Postgres-like):

-- Extraction: MySQL source → DuckDB → PostgreSQL target
{{ config(materialized='table', target='pg_docker') }}

SELECT
    m.id,
    m.name,
    m.amount,
    'mysql_to_pg' as federation_path
FROM {{ source('mysql_source', 'orders') }} m   -- lives on MySQL
WHERE m.amount > 20                              -- DuckDB SQL syntax

Rule of thumb: If your model references sources with connection:, write DuckDB SQL. If all sources are on the default target, write the target's native SQL.

CROSS-ENGINE FEDERATION

DVT's superpower: join data from different databases in a single SQL query.

-- models/federation/cross_engine_report.sql
-- Joins MySQL orders with Oracle inventory — DVT handles everything

{{ config(materialized='table') }}

SELECT
    o.order_id,
    o.product_id,
    o.quantity,
    i.warehouse_location,
    i.stock_level,
    CASE
        WHEN i.stock_level < o.quantity THEN 'backorder'
        ELSE 'in_stock'
    END as fulfillment_status
FROM {{ source('mysql_ops', 'orders') }} o         -- MySQL
JOIN {{ source('oracle_erp', 'inventory') }} i     -- Oracle
    ON o.product_id = i.product_id

Behind the scenes, DVT:

1

Extracts orders from MySQL via Sling

Sling streams the data into the local DuckDB cache

2

Extracts inventory from Oracle via Sling

Same process, stored as a separate table in DuckDB cache

3

Runs the JOIN in DuckDB

DuckDB executes your SQL locally against both cached tables

4

Loads the result to your target

Sling streams the result from DuckDB into PostgreSQL (or wherever your target is)

INCREMENTAL MODELS

Incremental models only process new or changed rows, making them efficient for large datasets:

-- models/fct_events.sql

{{
    config(
        materialized='incremental',
        unique_key='event_id',
        incremental_strategy='merge',
    )
}}

SELECT
    event_id,
    user_id,
    event_type,
    event_timestamp,
    properties
FROM {{ source('analytics', 'raw_events') }}

{% if is_incremental() %}
    WHERE event_timestamp > (SELECT MAX(event_timestamp) FROM {{ this }})
{% endif %}

Incremental Strategies

STRATEGYBEHAVIORUSE WHEN
appendINSERT new rows onlyEvent logs, immutable data
mergeUPSERT — insert new, update existingSlowly changing dimensions
delete+insertDELETE matching rows, then INSERTPartitioned data, corrected records

TARGETING DIFFERENT DATABASES

Use the targetconfig to send a model's output to a specific database (not the default):

-- models/federation/mysql_to_oracle.sql
-- Reads from MySQL, writes to Oracle

{{
    config(
        materialized='table',
        target='oracle_prod',        -- writes to Oracle (from profiles.yml)
    )
}}

SELECT
    id,
    name,
    amount
FROM {{ source('mysql_ops', 'transactions') }}

EPHEMERAL MODELS

Ephemeral models don't create tables — they're injected as CTEs into downstream queries:

-- models/staging/stg_active_users.sql
{{ config(materialized='ephemeral') }}

SELECT id, name, email
FROM {{ ref('raw_users') }}
WHERE is_active = true

-- models/report.sql
-- The ephemeral model becomes a CTE here:
SELECT * FROM {{ ref('stg_active_users') }}

JINJA TEMPLATING

SQL models support Jinja for dynamic SQL generation:

Variables

-- Use variables defined in dbt_project.yml or CLI
SELECT *
FROM {{ ref('orders') }}
WHERE order_date > '{{ var("start_date", "2024-01-01") }}'

Conditional Logic

SELECT
    id,
    name,
    {% if target.type == 'postgres' %}
        created_at::date as signup_date
    {% elif target.type == 'snowflake' %}
        DATE(created_at) as signup_date
    {% endif %}

Loops

{% set categories = ['electronics', 'clothing', 'food'] %}

SELECT
    order_id,
    {% for cat in categories %}
        SUM(CASE WHEN category = '{{ cat }}' THEN amount ELSE 0 END) as {{ cat }}_total
        {% if not loop.last %},{% endif %}
    {% endfor %}
FROM {{ ref('orders') }}
GROUP BY order_id

COMMON PATTERNS

Staging Layer

-- models/staging/stg_orders.sql
-- Clean and standardize raw data

{{ config(materialized='view') }}

SELECT
    id as order_id,
    TRIM(customer_name) as customer_name,
    CAST(amount AS DECIMAL(10,2)) as amount,
    LOWER(status) as status,
    created_at as order_date
FROM {{ source('raw', 'orders') }}
WHERE status != 'cancelled'

Fact Table

-- models/marts/fct_revenue.sql
-- Business logic, joins multiple staging tables

{{ config(materialized='table') }}

SELECT
    o.order_id,
    o.order_date,
    o.amount,
    c.customer_segment,
    p.product_category,
    o.amount * COALESCE(d.discount_rate, 0) as discount_amount,
    o.amount - (o.amount * COALESCE(d.discount_rate, 0)) as net_revenue
FROM {{ ref('stg_orders') }} o
LEFT JOIN {{ ref('dim_customers') }} c ON o.customer_id = c.id
LEFT JOIN {{ ref('dim_products') }} p ON o.product_id = p.id
LEFT JOIN {{ ref('stg_discounts') }} d ON o.discount_code = d.code

RUNNING SQL MODELS

# Run a specific model
dvt run --select stg_customers

# Run all models in a folder
dvt run --select staging.*

# Run a model and all its upstream dependencies
dvt run --select +fct_revenue

# Run a model and everything downstream
dvt run --select stg_customers+

# Run with full refresh (recreate tables from scratch)
dvt run --full-refresh

# Run models with a specific tag
dvt run --select tag:daily

SQL MODELS VS PYTHON MODELS

FEATURE.SQL MODELS.PY MODELS
File extension.sql.py
ExecutionOn database or DuckDBLocally on your machine
Materializationsview, table, incremental, ephemeraltable, incremental
JinjaFull Jinja supportNo Jinja (use dbt.ref() etc.)
Best forDatabase transformations, joins, aggregationsAPI calls, ML, file processing, complex logic
Can ref each other?Yes — in both directionsYes — in both directions