Overview Quickstart dbt Compatibility The Style Guide profiles.yml sources.yml Configuration Supported Databases

LOCAL FILESYSTEM

Read and write files directly from the local filesystem. Ideal for development, testing, and on-premises data pipelines where data lives on shared mounts or NAS.

PREREQUISITES

SDK: (none — built-in) — installed automatically by:

dvt sync

CONFIGURATION FIELDS

FIELD	REQUIRED	DEFAULT	DESCRIPTION
type	yes	—	Must be `filesystem` (`local` also accepted)
path	yes	—	Base directory path (absolute or relative to project root)
format	no	parquet	Default file format: csv, parquet, json, jsonl, delta, avro

PROFILES.YML EXAMPLE

my_project:
  target: prod_postgres
  outputs:
    prod_postgres:
      type: postgres
      host: localhost
      port: 5432
      user: analytics
      password: "{{ env_var('PG_PASSWORD') }}"
      dbname: analytics_db
      schema: public

    local_files:
      type: filesystem
      path: /data/raw                  # absolute path to data directory
      format: csv                      # default format for writes

SOURCES.YML EXAMPLE

sources:
  - name: raw_files
    meta:
      connection: local_files          # must match profiles.yml output name
    tables:
      - name: customers              # reads /data/raw/customers.csv (or .parquet, etc.)
      - name: transactions

MODEL EXAMPLE — BUCKET AS SOURCE

Sling extracts files from the bucket into the DuckDB cache. Model SQL executes in DuckDB (Postgres-like dialect).

-- models/staging/stg_customers.sql
-- Extraction: local files → DuckDB cache → Postgres
-- Written in DuckDB SQL dialect

{{ config(materialized='f_table') }}

SELECT
    id,
    name,
    email,
    created_at
FROM {{ source('raw_files', 'customers') }}
WHERE name IS NOT NULL

MODEL EXAMPLE — BUCKET AS TARGET

Model executes on the default target. Sling streams the result to the bucket in the specified format. Use config(format='...') to override the default format set in profiles.yml.

-- models/export/export_report.sql
-- Materialize to local filesystem
-- format overrides the default set in profiles.yml

{{ config(
    materialized='f_table',
    target='local_files',
    format='parquet'
) }}

SELECT *
FROM {{ ref('monthly_report') }}

-- Supported format values:
--   'csv'      — comma-separated values
--   'parquet'  — columnar (default, recommended)
--   'json'     — JSON objects
--   'jsonl'    — JSON Lines (one record per line)
--   'delta'    — Delta Lake (schema evolution, time travel)
--   'avro'     — Apache Avro

FILE FORMAT CONFIGURATION

The output format is resolved in order of priority:

1 in model file— highest priority

2 in profiles.yml output— project default

3— built-in default

FORMAT	VALUE	NOTES
Parquet	parquet	Columnar, compressed. Best for analytics. Default.
Delta Lake	delta	Schema evolution, time travel, ACID transactions.
CSV	csv	Universal compatibility. No schema enforcement.
JSON	json	Nested structures, human-readable.
JSON Lines	jsonl	One record per line. Streamable.
Avro	avro	Schema embedded, compact binary. Kafka-friendly.
IPC / Arrow	ipc	Arrow IPC format. Zero-copy reads.

INCREMENTAL EXTRACTION

Incremental models work with cloud storage sources. DVT resolves the watermark from the target database, extracts only changed rows, and merges them in the DuckDB cache.

-- models/staging/stg_transactions.sql
-- Incremental extraction from local files

{{ config(
    materialized='f_incremental',
    unique_key='txn_id'
) }}

SELECT txn_id, amount, txn_date, store_id, updated_at
FROM {{ source('raw_files', 'transactions') }}
{% if is_incremental() %}
WHERE updated_at > (SELECT MAX(updated_at) FROM {{ this }})
{% endif %}

NOTES

⚠Path must be accessible from the machine running DVT
⚠No network protocol — use NFS/CIFS mounts for remote files

DEBUGGING

Use dvt debug to test bucket connectivity. DVT verifies read/write access for each cloud storage connection.

dvt debug