dlthubsnow

Small, reviewable demo for running a dlt GitHub Issues pipeline inside Snowflake with Snowpark Container Services, then transforming the landed raw table with a minimal dbt layer executed through Snowflake-native dbt Projects.

Supported runtime model

This repo supports one runtime path for the dlt load:

Snowpark Container Services job service

The Python package is still developed and validated locally with uv, but the demo runtime is the Snowflake container job, not uv run dlthubsnow.

End-to-end flow

GitHub Issues API
  -> container image built from this repo
  -> Snowpark Container Services job service
  -> dlt pipeline/resource (incremental on updated_at, merge key id)
  -> Snowflake raw table: <DB>.<DLT_DATASET>.github_issues
  -> dbt source: github_raw.github_issues
  -> Snowflake DBT PROJECT object
  -> dbt curated view: <DB>.<DBT_SCHEMA>.github_issues_curated

Why this uses a job service

Snowpark Container Services supports both long-running services and job services.

This demo uses a job service because the dlt load is a finite batch run: it starts, fetches GitHub issues, writes to Snowflake, and exits. That maps directly to EXECUTE JOB SERVICE and is a better fit than keeping a long-running service alive for a one-shot ingestion task.

Default demo object names

Core data/dbt objects:

raw database: DLT_GITHUB_DEMO
raw dlt schema/dataset: github_issues_demo
curated dbt schema: github_issues_analytics
curated model: github_issues_curated
dbt project object example name: github_issues_demo_project

Snowpark Container Services examples in this repo:

compute pool: DLT_GITHUB_DEMO_POOL
image repository: DLT_GITHUB_REPO
spec stage: DLT_GITHUB_CONTAINER_STAGE
network rule: DLT_GITHUB_GH_API_RULE
external access integration: DLT_GITHUB_GH_EAI
example job service name: GITHUB_ISSUES_LOAD_JOB

Repository layout

.
├── .github/workflows/manual-load.yml
├── container/
│   ├── Dockerfile
│   └── entrypoint.sh
├── dbt_project.yml
├── models/
│   ├── curated/
│   │   ├── github_issues_curated.sql
│   │   └── schema.yml
│   └── sources.yml
├── profiles/
│   └── profiles.template.yml
├── snowflake/
│   ├── execute_container_job.example.sql
│   ├── github_issues_job.yaml
│   ├── setup_container_demo.sql
│   ├── setup_demo.sql
│   └── validate_demo.sql
├── src/dlthubsnow/
├── tests/
├── pyproject.toml
└── uv.lock

Generated dbt outputs (target/, logs/), local profile/secrets files, and local container override env files stay out of git.

Setup

1) Sync the repo for validation/build tooling

Install uv first if needed, then sync:

uv sync --python 3.13

This repo uses uv for local validation and container build preparation. Local tooling now requires Python 3.10+. It does not use uv run dlthubsnow as the demo runtime path.

2) Validate the Snow CLI connection used in this repo

All setup and validation examples use the working Snow CLI connection mpz:

snow connection test --connection mpz

If you need elevated privileges for setup, pass an explicit role override such as:

snow sql --connection mpz --role ACCOUNTADMIN -q "select current_role()"

mpz does not currently provide a default warehouse or database, so the commands in this repo keep those explicit.

3) Bootstrap the core warehouse/database/dbt objects

Run the shared setup SQL:

snow sql --connection mpz --role ACCOUNTADMIN -f snowflake/setup_demo.sql

This creates:

DLT_GITHUB_DEMO_WH
DLT_GITHUB_DEMO
DLT_GITHUB_DEMO.GITHUB_ISSUES_ANALYTICS

That curated schema must exist before Snowflake-side dbt compile/execute succeeds.

4) Bootstrap the Snowpark Container Services objects

Run the container setup SQL:

snow sql --connection mpz --role ACCOUNTADMIN -f snowflake/setup_container_demo.sql

That file creates:

the compute pool
the image repository
the spec stage
the GitHub egress network rule
the external access integration

The compute pool is created with INITIALLY_SUSPENDED = TRUE so the setup step itself does not start compute.

5) Snowflake auth inside the containerized dlt run

No Snowflake connection secret is required for the dlt job running inside Snowpark Container Services.

This repo uses Snowflake's auto-mounted OAuth token inside the container together with dlt's Snowflake destination support for:

DESTINATION__SNOWFLAKE__CREDENTIALS__AUTHENTICATOR=oauth
DESTINATION__SNOWFLAKE__CREDENTIALS__DATABASE=<database>
DESTINATION__SNOWFLAKE__CREDENTIALS__WAREHOUSE=<warehouse>

With dlt>=1.23.0, leaving the Snowflake host and token unset is intentional: dlt reads /snowflake/session/token and the Snowflake-provided environment automatically.

If you want higher GitHub API limits, you can still create an optional GENERIC_STRING secret for a GitHub token and add a separate secret mapping in snowflake/github_issues_job.yaml.

Build and publish the container image

The recommended path now matches the reference SPCS repo pattern: build and publish from GitHub Actions so developers do not need local Docker just to deploy this demo.

GitHub Actions path (recommended, no local Docker)

Configure these repository secrets first:

Secret	Value
`SNOWFLAKE_ACCOUNT`	Your Snowflake `org-account` identifier
`SNOWFLAKE_USER`	The user GitHub Actions should authenticate as
`SNOWFLAKE_PRIVATE_KEY`	Base64-encoded RSA private key PEM for that user
`SNOWFLAKE_ROLE`	Role with access to the demo image repository, stage, compute pool, warehouse, integration, and job service creation

Then use the new workflows:

Run Setup Snowflake SPCS infrastructure once. It authenticates with the Snowflake CLI from GitHub Actions and executes:
- snowflake/setup_demo.sql
- snowflake/setup_container_demo.sql
Run Build and run SPCS job whenever you want a load. That workflow:
- builds the image on the GitHub runner for linux/amd64
- logs into the Snowflake image registry
- pushes dlthubsnow:<sha> and dlthubsnow:latest
- uploads snowflake/github_issues_job.yaml to the demo stage
- executes the SPCS job service and waits for completion using Snowflake-provided OAuth inside the container

The workflow inputs default to the demo object names in this repo, but you can override them at dispatch time.

Local Docker path (optional manual alternative)

If you prefer to build and push manually from your workstation, the original Docker flow still works.

1) Build the image locally

Snowpark Container Services requires a linux/amd64 image:

docker buildx build --platform linux/amd64 -f container/Dockerfile -t dlthubsnow:latest --load .

The image:

installs the package with uv sync --locked --no-dev
runs as a non-root user
starts through container/entrypoint.sh
executes the packaged dlthubsnow CLI inside the container

2) Authenticate Docker to the Snowflake registry

snow spcs image-registry login --connection mpz --role ACCOUNTADMIN

3) Resolve the repository URL and push the image

snow spcs image-repository url \
  --connection mpz \
  --role ACCOUNTADMIN \
  --database DLT_GITHUB_DEMO \
  --schema PUBLIC \
  DLT_GITHUB_REPO

Then tag and push:

export IMAGE_REPO_URL='<registry>/<db>/<schema>/<repository>'
docker tag dlthubsnow:latest "$IMAGE_REPO_URL/dlthubsnow:latest"
docker push "$IMAGE_REPO_URL/dlthubsnow:latest"

Stage the job specification

Upload the Snowpark Container Services job spec template:

snow sql --connection mpz --role ACCOUNTADMIN -q "PUT file://$(pwd)/snowflake/github_issues_job.yaml @DLT_GITHUB_DEMO.PUBLIC.DLT_GITHUB_CONTAINER_STAGE AUTO_COMPRESS=FALSE OVERWRITE=TRUE"

Execute the dlt load inside Snowflake

Use the provided example SQL file:

snow sql --connection mpz --role ACCOUNTADMIN -f snowflake/execute_container_job.example.sql

The example matters here: Snowflake specification template arguments must be alphanumeric or valid JSON, so values such as dlt-hub, image URLs, and schema names with underscores are passed as JSON strings in the USING (...) clause.

That example:

drops any prior demo service with the same name
runs EXECUTE JOB SERVICE asynchronously
waits for completion with !SPCS_WAIT_FOR
describes the job service at the end

Because this is a job service, the container exits when the load finishes.

The job spec does not mount Snowflake credentials. Instead, dlt uses Snowflake's OAuth token injected into the running SPCS container.

Why incremental reruns still work in a job container

dlt restores pipeline state from the destination by default. That means the container does not need a persistent local state directory to keep incremental progress across runs.

In practice:

_dlt_pipeline_state lives in Snowflake
_dlt_loads records load history in Snowflake
a fresh job container can restore state from the destination dataset and continue incrementally

Validate the raw tables

After a successful job run:

snow sql --connection mpz --role ACCOUNTADMIN -f snowflake/validate_demo.sql

That validates:

github_issues exists
_dlt_loads exists
_dlt_pipeline_state exists

dbt layer

After the containerized dlt load lands raw issues in Snowflake, dbt reads that raw table as a source and builds a single curated view:

source: {{ source('github_raw', 'github_issues') }}
model: github_issues_curated
default raw schema: github_issues_demo
default curated schema: github_issues_analytics

The model intentionally stays small and SQL-only. It derives:

is_open
issue_age_days
days_since_update
label_count

Deploy and execute dbt in Snowflake

Snowflake dbt Projects require a profiles.yml, but Snowflake documentation warns that deployment stops if the committed profile contains passwords.

This repo therefore commits only profiles/profiles.template.yml. Create a temporary runtime copy:

export DBT_PROJECT_NAME='github_issues_demo_project'
export DBT_SNOWFLAKE_ROLE='<role>'
export DBT_SNOWFLAKE_WAREHOUSE='DLT_GITHUB_DEMO_WH'
export DBT_DATABASE='DLT_GITHUB_DEMO'
export DBT_SOURCE_SCHEMA='github_issues_demo'
export DBT_SCHEMA='github_issues_analytics'

tmpdir="$(mktemp -d)"
cp profiles/profiles.template.yml "$tmpdir/profiles.yml"

Deploy the DBT PROJECT object:

snow dbt deploy \
  --connection mpz \
  --database DLT_GITHUB_DEMO \
  --schema GITHUB_ISSUES_ANALYTICS \
  --warehouse DLT_GITHUB_DEMO_WH \
  --role "$DBT_SNOWFLAKE_ROLE" \
  --source . \
  --profiles-dir "$tmpdir" \
  --default-target dev \
  "$DBT_PROJECT_NAME"

Execute the curated model:

snow dbt execute \
  --connection mpz \
  --database DLT_GITHUB_DEMO \
  --schema GITHUB_ISSUES_ANALYTICS \
  --warehouse DLT_GITHUB_DEMO_WH \
  --role "$DBT_SNOWFLAKE_ROLE" \
  "$DBT_PROJECT_NAME" \
  run --select github_issues_curated

Clean up the temporary profile:

rm -rf "$tmpdir"

Optional curated-model spot check:

snow sql --connection mpz --role ACCOUNTADMIN -q "select issue_id, issue_number, is_open, issue_age_days from DLT_GITHUB_DEMO.GITHUB_ISSUES_ANALYTICS.GITHUB_ISSUES_CURATED limit 20"

Local validation only

These local checks exist to validate the repo and Python/dbt assets. They are not the supported runtime path for loading data.

uv run pytest

tmpdir="$(mktemp -d)"
cp profiles/profiles.template.yml "$tmpdir/profiles.yml"
uv run --python 3.13 dbt parse --profiles-dir "$tmpdir"
rm -rf "$tmpdir"

sh -n container/entrypoint.sh

If you also want to spot-check the container image locally and already have Docker installed, you can still run:

docker buildx build --platform linux/amd64 -f container/Dockerfile -t dlthubsnow:ci --load .

GitHub Actions

This repo now has three GitHub Actions workflows:

.github/workflows/manual-load.yml validates the repo (uv sync, pytest, dbt parse, entrypoint syntax, CI image build)
.github/workflows/setup-spcs-infra.yml runs the Snowflake setup SQL from GitHub Actions so you do not need local Docker or local Snow CLI for the one-time bootstrap
.github/workflows/deploy-spcs-job.yml builds the image on the GitHub runner, pushes it to the Snowflake registry, stages the job spec, and executes the job service with Snowflake-provided OAuth inside the container

That means the recommended deployment path no longer depends on local Docker.

What is still environment-dependent

This repo now contains the assets and docs for the Snowflake-container runtime path, but the following still depend on a live Snowflake environment:

compute pool creation privileges
image repository access
a Docker-capable build environment pushing to the Snowflake registry (the GitHub-hosted runner is the default choice)
stage upload
secret creation
live EXECUTE JOB SERVICE
live snow dbt deploy / snow dbt execute

Notes for reviewers

The dlt execution model is now Snowpark Container Services job service only.
The local Python CLI remains in the repo because the container runs that packaged entrypoint.
Secrets are intentionally excluded from version control.
The dbt layer remains intentionally lightweight: one source and one curated model.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.github/workflows		.github/workflows
container		container
models		models
profiles		profiles
snowflake		snowflake
src/dlthubsnow		src/dlthubsnow
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
README.md		README.md
dbt_project.yml		dbt_project.yml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

dlthubsnow

Supported runtime model

End-to-end flow

Why this uses a job service

Default demo object names

Repository layout

Setup

1) Sync the repo for validation/build tooling

2) Validate the Snow CLI connection used in this repo

3) Bootstrap the core warehouse/database/dbt objects

4) Bootstrap the Snowpark Container Services objects

5) Snowflake auth inside the containerized dlt run

Build and publish the container image

GitHub Actions path (recommended, no local Docker)

Local Docker path (optional manual alternative)

1) Build the image locally

2) Authenticate Docker to the Snowflake registry

3) Resolve the repository URL and push the image

Stage the job specification

Execute the dlt load inside Snowflake

Why incremental reruns still work in a job container

Validate the raw tables

dbt layer

Deploy and execute dbt in Snowflake

Local validation only

GitHub Actions

What is still environment-dependent

Notes for reviewers

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages