Overview Quickstart dbt Compatibility The Style Guide profiles.yml sources.yml Configuration Supported Databases

GOOGLE CLOUD STORAGE

Google's unified object storage. Tightly integrated with BigQuery, Dataflow, and the GCP ecosystem.

PREREQUISITES

SDK: google-cloud-storage — installed automatically by:

dvt sync

CONFIGURATION FIELDS

FIELD	REQUIRED	DEFAULT	DESCRIPTION
type	yes	—	Must be `gcs`
bucket	yes	—	GCS bucket name
project	no	—	GCP project ID
format	no	parquet	Default file format: csv, parquet, json, jsonl, delta, avro

PROFILES.YML EXAMPLE

my_project:
  target: prod_bigquery
  outputs:
    prod_bigquery:
      type: bigquery
      project: my-gcp-project
      dataset: analytics
      location: US

    gcs_lake:
      type: gcs
      bucket: my-gcs-data-lake
      project: my-gcp-project
      format: parquet                  # default format for writes

SOURCES.YML EXAMPLE

sources:
  - name: raw_gcs
    meta:
      connection: gcs_lake             # must match profiles.yml output name
    tables:
      - name: web_events
      - name: app_sessions

MODEL EXAMPLE — BUCKET AS SOURCE

Sling extracts files from the bucket into the DuckDB cache. Model SQL executes in DuckDB (Postgres-like dialect).

-- models/staging/stg_web_events.sql
-- Extraction: GCS → DuckDB cache → BigQuery
-- Written in DuckDB SQL dialect

{{ config(materialized='f_table') }}

SELECT
    event_id,
    user_id,
    event_type,
    event_timestamp
FROM {{ source('raw_gcs', 'web_events') }}

MODEL EXAMPLE — BUCKET AS TARGET

Model executes on the default target. Sling streams the result to the bucket in the specified format. Use config(format='...') to override the default format set in profiles.yml.

-- models/export/export_report.sql
-- Materialize to GCS bucket
-- format overrides the default set in profiles.yml

{{ config(
    materialized='f_table',
    target='gcs_lake',
    format='csv'
) }}

SELECT *
FROM {{ ref('monthly_report') }}

-- Supported format values:
--   'csv'      — comma-separated values
--   'parquet'  — columnar (default, recommended)
--   'json'     — JSON objects
--   'jsonl'    — JSON Lines (one record per line)
--   'delta'    — Delta Lake (schema evolution, time travel)
--   'avro'     — Apache Avro

FILE FORMAT CONFIGURATION

The output format is resolved in order of priority:

1 in model file— highest priority

2 in profiles.yml output— project default

3— built-in default

FORMAT	VALUE	NOTES
Parquet	parquet	Columnar, compressed. Best for analytics. Default.
Delta Lake	delta	Schema evolution, time travel, ACID transactions.
CSV	csv	Universal compatibility. No schema enforcement.
JSON	json	Nested structures, human-readable.
JSON Lines	jsonl	One record per line. Streamable.
Avro	avro	Schema embedded, compact binary. Kafka-friendly.
IPC / Arrow	ipc	Arrow IPC format. Zero-copy reads.

INCREMENTAL EXTRACTION

Incremental models work with cloud storage sources. DVT resolves the watermark from the target database, extracts only changed rows, and merges them in the DuckDB cache.

-- models/staging/stg_sessions.sql
-- Incremental extraction from GCS

{{ config(
    materialized='f_incremental',
    unique_key='session_id'
) }}

SELECT session_id, user_id, started_at, ended_at
FROM {{ source('raw_gcs', 'app_sessions') }}
{% if is_incremental() %}
WHERE started_at > (SELECT MAX(started_at) FROM {{ this }})
{% endif %}

NOTES

⚠Uses Application Default Credentials — run `gcloud auth application-default login`

DEBUGGING

Use dvt debug to test bucket connectivity. DVT verifies read/write access for each cloud storage connection.

dvt debug