LOCAL FILESYSTEM
type: fileRead and write files directly from the local filesystem. Ideal for development, testing, and on-premises data pipelines where data lives on shared mounts or NAS.
PREREQUISITES
SDK: (none — built-in) — installed automatically by:
dvt sync
CONFIGURATION FIELDS
| FIELD | REQUIRED | DEFAULT | DESCRIPTION |
|---|---|---|---|
| type | yes | — | Must be `file` |
| path | yes | — | Base directory path (absolute or relative to project root) |
| format | no | parquet | Default file format: csv, parquet, json, jsonl, delta, avro |
PROFILES.YML EXAMPLE
my_project:
target: prod_postgres
outputs:
prod_postgres:
type: postgres
host: localhost
port: 5432
user: analytics
password: "{{ env_var('PG_PASSWORD') }}"
dbname: analytics_db
schema: public
local_files:
type: file
path: /data/raw # absolute path to data directory
format: csv # default format for writesSOURCES.YML EXAMPLE
sources:
- name: raw_files
connection: local_files # must match profiles.yml output name
tables:
- name: customers # reads /data/raw/customers.csv (or .parquet, etc.)
- name: transactionsMODEL EXAMPLE — BUCKET AS SOURCE
Sling extracts files from the bucket into the DuckDB cache. Model SQL executes in DuckDB (Postgres-like dialect).
-- models/staging/stg_customers.sql
-- Extraction: local files → DuckDB cache → Postgres
-- Written in DuckDB SQL dialect
{{ config(materialized='table') }}
SELECT
id,
name,
email,
created_at
FROM {{ source('raw_files', 'customers') }}
WHERE name IS NOT NULLMODEL EXAMPLE — BUCKET AS TARGET
Model executes on the default target. Sling streams the result to the bucket in the specified format. Use config(format='...') to override the default format set in profiles.yml.
-- models/export/export_report.sql
-- Materialize to local filesystem
-- format overrides the default set in profiles.yml
{{ config(
materialized='table',
target='local_files',
format='parquet'
) }}
SELECT *
FROM {{ ref('monthly_report') }}
-- Supported format values:
-- 'csv' — comma-separated values
-- 'parquet' — columnar (default, recommended)
-- 'json' — JSON objects
-- 'jsonl' — JSON Lines (one record per line)
-- 'delta' — Delta Lake (schema evolution, time travel)
-- 'avro' — Apache AvroFILE FORMAT CONFIGURATION
The output format is resolved in order of priority:
config(format='delta') in model file— highest priorityformat: in profiles.yml output— project defaultparquet— built-in default| FORMAT | VALUE | NOTES |
|---|---|---|
| Parquet | parquet | Columnar, compressed. Best for analytics. Default. |
| Delta Lake | delta | Schema evolution, time travel, ACID transactions. |
| CSV | csv | Universal compatibility. No schema enforcement. |
| JSON | json | Nested structures, human-readable. |
| JSON Lines | jsonl | One record per line. Streamable. |
| Avro | avro | Schema embedded, compact binary. Kafka-friendly. |
| IPC / Arrow | ipc | Arrow IPC format. Zero-copy reads. |
INCREMENTAL EXTRACTION
Incremental models work with cloud storage sources. DVT resolves the watermark from the target database, extracts only changed rows, and merges them in the DuckDB cache.
-- models/staging/stg_transactions.sql
-- Incremental extraction from local files
{{ config(
materialized='incremental',
unique_key='txn_id'
) }}
SELECT txn_id, amount, txn_date, store_id, updated_at
FROM {{ source('raw_files', 'transactions') }}
{% if is_incremental() %}
WHERE updated_at > (SELECT MAX(updated_at) FROM {{ this }})
{% endif %}NOTES
- ⚠Path must be accessible from the machine running DVT
- ⚠No network protocol — use NFS/CIFS mounts for remote files
DEBUGGING
Use dvt debug to test bucket connectivity. DVT verifies read/write access for each cloud storage connection.
dvt debug