SOURCES.YML
The sources.yml file declares where your raw data lives. It maps database tables to source references that your models can use with {{ source('name', 'table') }}.
THE CONNECTION: PROPERTY
The connection:property is DVT's key extension to dbt's sources.yml. It links a source to a specific output in profiles.yml, telling DVT which database this data lives on.
External sources (different engine than default)
Any source that lives on a different database than the default target must declare connection: pointing to the profiles.yml output name. Without it, DVT assumes the source is on the default target and SQL will fail.
Default target sources (same engine as default)
Sources that live on the default target must not use the connection: property. These sources are accessed directly by the adapter via pushdown — they don't need extraction. Providing connection: for a default-target source will cause an error.
EXAMPLE
# models/staging/sources.yml
version: 2
sources:
# ─── External Sources (need connection:) ───
- name: crm
description: "CRM data from MySQL operational database"
connection: mysql_ops # ← REQUIRED: points to profiles.yml
tables:
- name: customers
description: "Customer master data"
- name: orders
description: "Sales orders"
- name: products
- name: warehouse
description: "ERP data from Snowflake"
connection: sf_warehouse # ← REQUIRED: points to profiles.yml
database: PROD_DB
schema: RAW
tables:
- name: invoices
- name: shipments
- name: vendors
- name: data_lake
description: "Raw files from S3"
connection: data_lake # ← works with bucket storages too
tables:
- name: web_events
- name: app_logs
# ─── Default Target Sources (NO connection:) ───
- name: analytics
description: "Tables on the default PostgreSQL target"
schema: public
# NO connection: — lives on the default target
tables:
- name: seed_categories
- name: dim_dates
- name: historical_metricsHOW DVT USES SOURCES
When a model references a source, DVT checks the source's connection:against the model's target to determine the execution path:
| SOURCE CONFIG | WHAT HAPPENS | SQL DIALECT |
|---|---|---|
No connection: | Adapter pushdown — SQL runs directly on the default target | Target's native SQL |
connection: same engine | Extraction — Sling pulls data into DuckDB cache | DuckDB SQL |
connection: different engine | Extraction — Sling pulls data into DuckDB cache | DuckDB SQL |
If any source in a model has a connection:, the entire model switches to the extraction path and runs in DuckDB SQL. You cannot mix pushdown and extraction sources in the same model.
COMMON PATTERNS
Multi-source extraction model:
-- models/marts/cross_engine_report.sql
-- crm is on MySQL, warehouse is on Snowflake
-- Both get extracted → DuckDB joins them → result loads to default target
{{ config(materialized='table') }}
SELECT
c.customer_name,
i.invoice_date,
i.total_amount
FROM {{ source('crm', 'customers') }} c -- MySQL (extracted)
JOIN {{ source('warehouse', 'invoices') }} i -- Snowflake (extracted)
ON c.customer_id = i.customer_idDefault-target pushdown model:
-- models/staging/stg_categories.sql
-- analytics source has no connection: → pushdown on default target
-- SQL dialect: PostgreSQL (native)
{{ config(materialized='view') }}
SELECT id, name, parent_id
FROM {{ source('analytics', 'seed_categories') }}Bucket storage source:
-- models/staging/stg_web_events.sql
-- data_lake source points to S3 → extracted via Sling
{{ config(materialized='table') }}
SELECT event_id, user_id, event_type, event_timestamp
FROM {{ source('data_lake', 'web_events') }}SOURCE PROPERTIES REFERENCE
| PROPERTY | REQUIRED | DESCRIPTION |
|---|---|---|
| name | yes | Unique name for this source group |
| connection | no | profiles.yml output name. Required for external sources. Must NOT be set for default-target sources. |
| database | no | Database name (overrides the output's default) |
| schema | no | Schema name (overrides the output's default) |
| description | no | Human-readable description for documentation |
| tables | yes | List of table objects with at minimum a name field |
| tags | no | Tags for source selection (dvt run --select tag:...) |
| freshness | no | Freshness check configuration (standard dbt) |
COMMON MISTAKES
WRONG
sources:
- name: analytics
connection: pg_dev # default target!
tables:
- name: dim_datesCORRECT
sources:
- name: analytics
# No connection — on default target
tables:
- name: dim_datesDon't add connection: for sources on the default target. It will trigger unnecessary extraction through DuckDB when the adapter can query the table directly.
WRONG
sources:
- name: crm
# Missing connection!
tables:
- name: customers # lives on MySQLCORRECT
sources:
- name: crm
connection: mysql_ops # external source
tables:
- name: customersExternal sources must have connection:. Without it, DVT assumes the table is on the default target and the query will fail with 'relation does not exist'.