.SQL MODELS
SQL models are the foundation of DVT. A .sql model is a SQL file in your models/ folder that defines a transformation. DVT compiles the Jinja, executes the SQL, and materializes the result as a table or view in your target database.
DVT is backward-compatible with dbt. If you already know dbt, SQL models in DVT work exactly the same way — with the added ability to query across different database engines.
YOUR FIRST SQL MODEL
Create a .sql file in your models/ folder. Each file becomes one model (one table or view in your database):
-- models/stg_customers.sql
SELECT
id,
name,
email,
created_at
FROM {{ ref('raw_customers') }}
WHERE email IS NOT NULLRun it with dvt run --select stg_customers. DVT will create a view (by default) named stg_customers in your target database.
CONFIGURATION
Use the {{ config() }}block at the top of your model to control how it's materialized:
-- models/dim_customers.sql
{{ config(materialized='table') }}
SELECT
id,
name,
email,
DATE(created_at) as signup_date
FROM {{ ref('stg_customers') }}Materialization Options
| TYPE | WHAT IT DOES | WHEN TO USE |
|---|---|---|
| view | Creates a database view (default) | Lightweight transformations, staging layers |
| table | Creates a physical table (DROP + CREATE) | Heavy transformations, frequently queried tables |
| incremental | Appends or merges new rows only | Large tables, event data, append-only logs |
| ephemeral | Injected as CTE into downstream queries | Reusable logic that doesn't need its own table |
All Config Options
{{ config(
materialized='table', -- view, table, incremental, ephemeral
target='snowflake_prod', -- override default target (DVT-specific)
schema='analytics', -- override default schema
tags=['staging', 'daily'], -- for selection: dvt run --select tag:staging
pre_hook="DELETE FROM ...", -- SQL to run before this model
post_hook="GRANT SELECT ON ...", -- SQL to run after this model
unique_key='id', -- for incremental: merge key
incremental_strategy='merge', -- merge, append, delete+insert
) }}REFERENCING OTHER MODELS WITH REF()
Use {{ ref('model_name') }} to reference another model. DVT resolves it to the correct table name and tracks the dependency in the DAG:
-- models/fct_orders.sql
{{ config(materialized='table') }}
SELECT
o.order_id,
o.order_date,
o.amount,
c.customer_name,
c.segment
FROM {{ ref('stg_orders') }} o
JOIN {{ ref('dim_customers') }} c
ON o.customer_id = c.idKey point: ref() works with both SQL and Python models. If dim_customersis a Python model, this SQL model can still reference it — the Python model's output table is just a regular table in your database.
REFERENCING SOURCES WITH SOURCE()
Use {{ source('source_name', 'table_name') }} to reference raw data defined in sources.yml:
-- models/stg_orders.sql
{{ config(materialized='view') }}
SELECT
id as order_id,
customer_id,
CAST(amount AS DECIMAL(10,2)) as amount,
created_at as order_date
FROM {{ source('crm', 'orders') }}The behavior depends on whether the source has a connection: property in sources.yml:
| SOURCE CONFIG | EXECUTION PATH | SQL DIALECT |
|---|---|---|
No connection: | Pushdown — SQL runs directly on target database | Target's native SQL |
Has connection: | Extraction — data pulled via Sling into DuckDB | DuckDB SQL |
TWO SQL DIALECTS
This is the most important concept in DVT: a single project can contain SQL in two different dialects, depending on the execution path.
Pushdown Models (target dialect)
When all sources are on the default target, SQL runs directly on the database. Write in the target's native SQL:
-- Pushdown to PostgreSQL — use PostgreSQL syntax
{{ config(materialized='table') }}
SELECT
id,
name,
created_at::date as signup_date, -- PG cast syntax
COALESCE(email, 'unknown') as email
FROM {{ ref('raw_customers') }}
WHERE created_at > NOW() - INTERVAL '30 days' -- PG interval syntaxExtraction Models (DuckDB dialect)
When sources are on different engines (cross-engine query), DVT extracts data into DuckDB and runs the SQL there. Write in DuckDB SQL (Postgres-like):
-- Extraction: MySQL source → DuckDB → PostgreSQL target
{{ config(materialized='table', target='pg_docker') }}
SELECT
m.id,
m.name,
m.amount,
'mysql_to_pg' as federation_path
FROM {{ source('mysql_source', 'orders') }} m -- lives on MySQL
WHERE m.amount > 20 -- DuckDB SQL syntaxRule of thumb: If your model references sources with connection:, write DuckDB SQL. If all sources are on the default target, write the target's native SQL.
CROSS-ENGINE FEDERATION
DVT's superpower: join data from different databases in a single SQL query.
-- models/federation/cross_engine_report.sql
-- Joins MySQL orders with Oracle inventory — DVT handles everything
{{ config(materialized='table') }}
SELECT
o.order_id,
o.product_id,
o.quantity,
i.warehouse_location,
i.stock_level,
CASE
WHEN i.stock_level < o.quantity THEN 'backorder'
ELSE 'in_stock'
END as fulfillment_status
FROM {{ source('mysql_ops', 'orders') }} o -- MySQL
JOIN {{ source('oracle_erp', 'inventory') }} i -- Oracle
ON o.product_id = i.product_idBehind the scenes, DVT:
Extracts orders from MySQL via Sling
Sling streams the data into the local DuckDB cache
Extracts inventory from Oracle via Sling
Same process, stored as a separate table in DuckDB cache
Runs the JOIN in DuckDB
DuckDB executes your SQL locally against both cached tables
Loads the result to your target
Sling streams the result from DuckDB into PostgreSQL (or wherever your target is)
INCREMENTAL MODELS
Incremental models only process new or changed rows, making them efficient for large datasets:
-- models/fct_events.sql
{{
config(
materialized='incremental',
unique_key='event_id',
incremental_strategy='merge',
)
}}
SELECT
event_id,
user_id,
event_type,
event_timestamp,
properties
FROM {{ source('analytics', 'raw_events') }}
{% if is_incremental() %}
WHERE event_timestamp > (SELECT MAX(event_timestamp) FROM {{ this }})
{% endif %}Incremental Strategies
| STRATEGY | BEHAVIOR | USE WHEN |
|---|---|---|
| append | INSERT new rows only | Event logs, immutable data |
| merge | UPSERT — insert new, update existing | Slowly changing dimensions |
| delete+insert | DELETE matching rows, then INSERT | Partitioned data, corrected records |
TARGETING DIFFERENT DATABASES
Use the targetconfig to send a model's output to a specific database (not the default):
-- models/federation/mysql_to_oracle.sql
-- Reads from MySQL, writes to Oracle
{{
config(
materialized='table',
target='oracle_prod', -- writes to Oracle (from profiles.yml)
)
}}
SELECT
id,
name,
amount
FROM {{ source('mysql_ops', 'transactions') }}EPHEMERAL MODELS
Ephemeral models don't create tables — they're injected as CTEs into downstream queries:
-- models/staging/stg_active_users.sql
{{ config(materialized='ephemeral') }}
SELECT id, name, email
FROM {{ ref('raw_users') }}
WHERE is_active = true
-- models/report.sql
-- The ephemeral model becomes a CTE here:
SELECT * FROM {{ ref('stg_active_users') }}JINJA TEMPLATING
SQL models support Jinja for dynamic SQL generation:
Variables
-- Use variables defined in dbt_project.yml or CLI
SELECT *
FROM {{ ref('orders') }}
WHERE order_date > '{{ var("start_date", "2024-01-01") }}'Conditional Logic
SELECT
id,
name,
{% if target.type == 'postgres' %}
created_at::date as signup_date
{% elif target.type == 'snowflake' %}
DATE(created_at) as signup_date
{% endif %}Loops
{% set categories = ['electronics', 'clothing', 'food'] %}
SELECT
order_id,
{% for cat in categories %}
SUM(CASE WHEN category = '{{ cat }}' THEN amount ELSE 0 END) as {{ cat }}_total
{% if not loop.last %},{% endif %}
{% endfor %}
FROM {{ ref('orders') }}
GROUP BY order_idCOMMON PATTERNS
Staging Layer
-- models/staging/stg_orders.sql
-- Clean and standardize raw data
{{ config(materialized='view') }}
SELECT
id as order_id,
TRIM(customer_name) as customer_name,
CAST(amount AS DECIMAL(10,2)) as amount,
LOWER(status) as status,
created_at as order_date
FROM {{ source('raw', 'orders') }}
WHERE status != 'cancelled'Fact Table
-- models/marts/fct_revenue.sql
-- Business logic, joins multiple staging tables
{{ config(materialized='table') }}
SELECT
o.order_id,
o.order_date,
o.amount,
c.customer_segment,
p.product_category,
o.amount * COALESCE(d.discount_rate, 0) as discount_amount,
o.amount - (o.amount * COALESCE(d.discount_rate, 0)) as net_revenue
FROM {{ ref('stg_orders') }} o
LEFT JOIN {{ ref('dim_customers') }} c ON o.customer_id = c.id
LEFT JOIN {{ ref('dim_products') }} p ON o.product_id = p.id
LEFT JOIN {{ ref('stg_discounts') }} d ON o.discount_code = d.codeRUNNING SQL MODELS
# Run a specific model dvt run --select stg_customers # Run all models in a folder dvt run --select staging.* # Run a model and all its upstream dependencies dvt run --select +fct_revenue # Run a model and everything downstream dvt run --select stg_customers+ # Run with full refresh (recreate tables from scratch) dvt run --full-refresh # Run models with a specific tag dvt run --select tag:daily
SQL MODELS VS PYTHON MODELS
| FEATURE | .SQL MODELS | .PY MODELS |
|---|---|---|
| File extension | .sql | .py |
| Execution | On database or DuckDB | Locally on your machine |
| Materializations | view, table, incremental, ephemeral | table, incremental |
| Jinja | Full Jinja support | No Jinja (use dbt.ref() etc.) |
| Best for | Database transformations, joins, aggregations | API calls, ML, file processing, complex logic |
| Can ref each other? | Yes — in both directions | Yes — in both directions |