← All Bucket Storages

LOCAL FILESYSTEM

type: file

Read and write files directly from the local filesystem. Ideal for development, testing, and on-premises data pipelines where data lives on shared mounts or NAS.

PREREQUISITES

SDK: (none — built-in) — installed automatically by:

dvt sync

CONFIGURATION FIELDS

FIELDREQUIREDDEFAULTDESCRIPTION
typeyesMust be `file`
pathyesBase directory path (absolute or relative to project root)
formatnoparquetDefault file format: csv, parquet, json, jsonl, delta, avro

PROFILES.YML EXAMPLE

my_project:
  target: prod_postgres
  outputs:
    prod_postgres:
      type: postgres
      host: localhost
      port: 5432
      user: analytics
      password: "{{ env_var('PG_PASSWORD') }}"
      dbname: analytics_db
      schema: public

    local_files:
      type: file
      path: /data/raw                  # absolute path to data directory
      format: csv                      # default format for writes

SOURCES.YML EXAMPLE

sources:
  - name: raw_files
    connection: local_files          # must match profiles.yml output name
    tables:
      - name: customers              # reads /data/raw/customers.csv (or .parquet, etc.)
      - name: transactions

MODEL EXAMPLE — BUCKET AS SOURCE

Sling extracts files from the bucket into the DuckDB cache. Model SQL executes in DuckDB (Postgres-like dialect).

-- models/staging/stg_customers.sql
-- Extraction: local files → DuckDB cache → Postgres
-- Written in DuckDB SQL dialect

{{ config(materialized='table') }}

SELECT
    id,
    name,
    email,
    created_at
FROM {{ source('raw_files', 'customers') }}
WHERE name IS NOT NULL

MODEL EXAMPLE — BUCKET AS TARGET

Model executes on the default target. Sling streams the result to the bucket in the specified format. Use config(format='...') to override the default format set in profiles.yml.

-- models/export/export_report.sql
-- Materialize to local filesystem
-- format overrides the default set in profiles.yml

{{ config(
    materialized='table',
    target='local_files',
    format='parquet'
) }}

SELECT *
FROM {{ ref('monthly_report') }}

-- Supported format values:
--   'csv'      — comma-separated values
--   'parquet'  — columnar (default, recommended)
--   'json'     — JSON objects
--   'jsonl'    — JSON Lines (one record per line)
--   'delta'    — Delta Lake (schema evolution, time travel)
--   'avro'     — Apache Avro

FILE FORMAT CONFIGURATION

The output format is resolved in order of priority:

1config(format='delta') in model file— highest priority
2format: in profiles.yml output— project default
3parquet— built-in default
FORMATVALUENOTES
ParquetparquetColumnar, compressed. Best for analytics. Default.
Delta LakedeltaSchema evolution, time travel, ACID transactions.
CSVcsvUniversal compatibility. No schema enforcement.
JSONjsonNested structures, human-readable.
JSON LinesjsonlOne record per line. Streamable.
AvroavroSchema embedded, compact binary. Kafka-friendly.
IPC / ArrowipcArrow IPC format. Zero-copy reads.

INCREMENTAL EXTRACTION

Incremental models work with cloud storage sources. DVT resolves the watermark from the target database, extracts only changed rows, and merges them in the DuckDB cache.

-- models/staging/stg_transactions.sql
-- Incremental extraction from local files

{{ config(
    materialized='incremental',
    unique_key='txn_id'
) }}

SELECT txn_id, amount, txn_date, store_id, updated_at
FROM {{ source('raw_files', 'transactions') }}
{% if is_incremental() %}
WHERE updated_at > (SELECT MAX(updated_at) FROM {{ this }})
{% endif %}

NOTES

  • Path must be accessible from the machine running DVT
  • No network protocol — use NFS/CIFS mounts for remote files

DEBUGGING

Use dvt debug to test bucket connectivity. DVT verifies read/write access for each cloud storage connection.

dvt debug