SOURCES.YML

The sources.yml file declares where your raw data lives. It maps database tables to source references that your models can use with {{ source('name', 'table') }}.

THE CONNECTION: PROPERTY

The connection:property is DVT's key extension to dbt's sources.yml. It links a source to a specific output in profiles.yml, telling DVT which database this data lives on.

External sources (different engine than default)

MUST have connection:

Any source that lives on a different database than the default target must declare connection: pointing to the profiles.yml output name. Without it, DVT assumes the source is on the default target and SQL will fail.

Default target sources (same engine as default)

MUST NOT have connection:

Sources that live on the default target must not use the connection: property. These sources are accessed directly by the adapter via pushdown — they don't need extraction. Providing connection: for a default-target source will cause an error.

EXAMPLE

# models/staging/sources.yml
version: 2

sources:
  # ─── External Sources (need connection:) ───

  - name: crm
    description: "CRM data from MySQL operational database"
    connection: mysql_ops              # ← REQUIRED: points to profiles.yml
    tables:
      - name: customers
        description: "Customer master data"
      - name: orders
        description: "Sales orders"
      - name: products

  - name: warehouse
    description: "ERP data from Snowflake"
    connection: sf_warehouse           # ← REQUIRED: points to profiles.yml
    database: PROD_DB
    schema: RAW
    tables:
      - name: invoices
      - name: shipments
      - name: vendors

  - name: data_lake
    description: "Raw files from S3"
    connection: data_lake              # ← works with bucket storages too
    tables:
      - name: web_events
      - name: app_logs

  # ─── Default Target Sources (NO connection:) ───

  - name: analytics
    description: "Tables on the default PostgreSQL target"
    schema: public
    # NO connection: — lives on the default target
    tables:
      - name: seed_categories
      - name: dim_dates
      - name: historical_metrics

HOW DVT USES SOURCES

When a model references a source, DVT checks the source's connection:against the model's target to determine the execution path:

SOURCE CONFIGWHAT HAPPENSSQL DIALECT
No connection:Adapter pushdown — SQL runs directly on the default targetTarget's native SQL
connection: same engineExtraction — Sling pulls data into DuckDB cacheDuckDB SQL
connection: different engineExtraction — Sling pulls data into DuckDB cacheDuckDB SQL

If any source in a model has a connection:, the entire model switches to the extraction path and runs in DuckDB SQL. You cannot mix pushdown and extraction sources in the same model.

COMMON PATTERNS

Multi-source extraction model:

-- models/marts/cross_engine_report.sql
-- crm is on MySQL, warehouse is on Snowflake
-- Both get extracted → DuckDB joins them → result loads to default target

{{ config(materialized='table') }}

SELECT
    c.customer_name,
    i.invoice_date,
    i.total_amount
FROM {{ source('crm', 'customers') }} c          -- MySQL (extracted)
JOIN {{ source('warehouse', 'invoices') }} i      -- Snowflake (extracted)
    ON c.customer_id = i.customer_id

Default-target pushdown model:

-- models/staging/stg_categories.sql
-- analytics source has no connection: → pushdown on default target
-- SQL dialect: PostgreSQL (native)

{{ config(materialized='view') }}

SELECT id, name, parent_id
FROM {{ source('analytics', 'seed_categories') }}

Bucket storage source:

-- models/staging/stg_web_events.sql
-- data_lake source points to S3 → extracted via Sling

{{ config(materialized='table') }}

SELECT event_id, user_id, event_type, event_timestamp
FROM {{ source('data_lake', 'web_events') }}

SOURCE PROPERTIES REFERENCE

PROPERTYREQUIREDDESCRIPTION
nameyesUnique name for this source group
connectionnoprofiles.yml output name. Required for external sources. Must NOT be set for default-target sources.
databasenoDatabase name (overrides the output's default)
schemanoSchema name (overrides the output's default)
descriptionnoHuman-readable description for documentation
tablesyesList of table objects with at minimum a name field
tagsnoTags for source selection (dvt run --select tag:...)
freshnessnoFreshness check configuration (standard dbt)

COMMON MISTAKES

WRONG

sources:
  - name: analytics
    connection: pg_dev         # default target!
    tables:
      - name: dim_dates

CORRECT

sources:
  - name: analytics
    # No connection — on default target
    tables:
      - name: dim_dates

Don't add connection: for sources on the default target. It will trigger unnecessary extraction through DuckDB when the adapter can query the table directly.

WRONG

sources:
  - name: crm
    # Missing connection!
    tables:
      - name: customers        # lives on MySQL

CORRECT

sources:
  - name: crm
    connection: mysql_ops      # external source
    tables:
      - name: customers

External sources must have connection:. Without it, DVT assumes the table is on the default target and the query will fail with 'relation does not exist'.