Small, reviewable demo for running a dlt GitHub Issues pipeline inside Snowflake with Snowpark Container Services, then transforming the landed raw table with a minimal dbt layer executed through Snowflake-native dbt Projects.
This repo supports one runtime path for the dlt load:
- Snowpark Container Services job service
The Python package is still developed and validated locally with uv, but the demo runtime is the Snowflake container job, not uv run dlthubsnow.
GitHub Issues API
-> container image built from this repo
-> Snowpark Container Services job service
-> dlt pipeline/resource (incremental on updated_at, merge key id)
-> Snowflake raw table: <DB>.<DLT_DATASET>.github_issues
-> dbt source: github_raw.github_issues
-> Snowflake DBT PROJECT object
-> dbt curated view: <DB>.<DBT_SCHEMA>.github_issues_curated
Snowpark Container Services supports both long-running services and job services.
This demo uses a job service because the dlt load is a finite batch run: it starts, fetches GitHub issues, writes to Snowflake, and exits. That maps directly to EXECUTE JOB SERVICE and is a better fit than keeping a long-running service alive for a one-shot ingestion task.
Core data/dbt objects:
- raw database:
DLT_GITHUB_DEMO - raw dlt schema/dataset:
github_issues_demo - curated dbt schema:
github_issues_analytics - curated model:
github_issues_curated - dbt project object example name:
github_issues_demo_project
Snowpark Container Services examples in this repo:
- compute pool:
DLT_GITHUB_DEMO_POOL - image repository:
DLT_GITHUB_REPO - spec stage:
DLT_GITHUB_CONTAINER_STAGE - network rule:
DLT_GITHUB_GH_API_RULE - external access integration:
DLT_GITHUB_GH_EAI - example job service name:
GITHUB_ISSUES_LOAD_JOB
.
├── .github/workflows/manual-load.yml
├── container/
│ ├── Dockerfile
│ └── entrypoint.sh
├── dbt_project.yml
├── models/
│ ├── curated/
│ │ ├── github_issues_curated.sql
│ │ └── schema.yml
│ └── sources.yml
├── profiles/
│ └── profiles.template.yml
├── snowflake/
│ ├── execute_container_job.example.sql
│ ├── github_issues_job.yaml
│ ├── setup_container_demo.sql
│ ├── setup_demo.sql
│ └── validate_demo.sql
├── src/dlthubsnow/
├── tests/
├── pyproject.toml
└── uv.lock
Generated dbt outputs (target/, logs/), local profile/secrets files, and local container override env files stay out of git.
Install uv first if needed, then sync:
uv sync --python 3.13This repo uses uv for local validation and container build preparation. Local tooling now requires Python 3.10+. It does not use uv run dlthubsnow as the demo runtime path.
All setup and validation examples use the working Snow CLI connection mpz:
snow connection test --connection mpzIf you need elevated privileges for setup, pass an explicit role override such as:
snow sql --connection mpz --role ACCOUNTADMIN -q "select current_role()"mpz does not currently provide a default warehouse or database, so the commands in this repo keep those explicit.
Run the shared setup SQL:
snow sql --connection mpz --role ACCOUNTADMIN -f snowflake/setup_demo.sqlThis creates:
DLT_GITHUB_DEMO_WHDLT_GITHUB_DEMODLT_GITHUB_DEMO.GITHUB_ISSUES_ANALYTICS
That curated schema must exist before Snowflake-side dbt compile/execute succeeds.
Run the container setup SQL:
snow sql --connection mpz --role ACCOUNTADMIN -f snowflake/setup_container_demo.sqlThat file creates:
- the compute pool
- the image repository
- the spec stage
- the GitHub egress network rule
- the external access integration
The compute pool is created with INITIALLY_SUSPENDED = TRUE so the setup step itself does not start compute.
No Snowflake connection secret is required for the dlt job running inside Snowpark Container Services.
This repo uses Snowflake's auto-mounted OAuth token inside the container together with dlt's Snowflake destination support for:
DESTINATION__SNOWFLAKE__CREDENTIALS__AUTHENTICATOR=oauthDESTINATION__SNOWFLAKE__CREDENTIALS__DATABASE=<database>DESTINATION__SNOWFLAKE__CREDENTIALS__WAREHOUSE=<warehouse>
With dlt>=1.23.0, leaving the Snowflake host and token unset is intentional: dlt reads /snowflake/session/token and the Snowflake-provided environment automatically.
If you want higher GitHub API limits, you can still create an optional GENERIC_STRING secret for a GitHub token and add a separate secret mapping in snowflake/github_issues_job.yaml.
The recommended path now matches the reference SPCS repo pattern: build and publish from GitHub Actions so developers do not need local Docker just to deploy this demo.
Configure these repository secrets first:
| Secret | Value |
|---|---|
SNOWFLAKE_ACCOUNT |
Your Snowflake org-account identifier |
SNOWFLAKE_USER |
The user GitHub Actions should authenticate as |
SNOWFLAKE_PRIVATE_KEY |
Base64-encoded RSA private key PEM for that user |
SNOWFLAKE_ROLE |
Role with access to the demo image repository, stage, compute pool, warehouse, integration, and job service creation |
Then use the new workflows:
- Run
Setup Snowflake SPCS infrastructureonce. It authenticates with the Snowflake CLI from GitHub Actions and executes:snowflake/setup_demo.sqlsnowflake/setup_container_demo.sql
- Run
Build and run SPCS jobwhenever you want a load. That workflow:- builds the image on the GitHub runner for
linux/amd64 - logs into the Snowflake image registry
- pushes
dlthubsnow:<sha>anddlthubsnow:latest - uploads
snowflake/github_issues_job.yamlto the demo stage - executes the SPCS job service and waits for completion using Snowflake-provided OAuth inside the container
- builds the image on the GitHub runner for
The workflow inputs default to the demo object names in this repo, but you can override them at dispatch time.
If you prefer to build and push manually from your workstation, the original Docker flow still works.
Snowpark Container Services requires a linux/amd64 image:
docker buildx build --platform linux/amd64 -f container/Dockerfile -t dlthubsnow:latest --load .The image:
- installs the package with
uv sync --locked --no-dev - runs as a non-root user
- starts through
container/entrypoint.sh - executes the packaged
dlthubsnowCLI inside the container
snow spcs image-registry login --connection mpz --role ACCOUNTADMINsnow spcs image-repository url \
--connection mpz \
--role ACCOUNTADMIN \
--database DLT_GITHUB_DEMO \
--schema PUBLIC \
DLT_GITHUB_REPOThen tag and push:
export IMAGE_REPO_URL='<registry>/<db>/<schema>/<repository>'
docker tag dlthubsnow:latest "$IMAGE_REPO_URL/dlthubsnow:latest"
docker push "$IMAGE_REPO_URL/dlthubsnow:latest"Upload the Snowpark Container Services job spec template:
snow sql --connection mpz --role ACCOUNTADMIN -q "PUT file://$(pwd)/snowflake/github_issues_job.yaml @DLT_GITHUB_DEMO.PUBLIC.DLT_GITHUB_CONTAINER_STAGE AUTO_COMPRESS=FALSE OVERWRITE=TRUE"Use the provided example SQL file:
snow sql --connection mpz --role ACCOUNTADMIN -f snowflake/execute_container_job.example.sqlThe example matters here: Snowflake specification template arguments must be alphanumeric or valid JSON, so values such as dlt-hub, image URLs, and schema names with underscores are passed as JSON strings in the USING (...) clause.
That example:
- drops any prior demo service with the same name
- runs
EXECUTE JOB SERVICEasynchronously - waits for completion with
!SPCS_WAIT_FOR - describes the job service at the end
Because this is a job service, the container exits when the load finishes.
The job spec does not mount Snowflake credentials. Instead, dlt uses Snowflake's OAuth token injected into the running SPCS container.
dlt restores pipeline state from the destination by default. That means the container does not need a persistent local state directory to keep incremental progress across runs.
In practice:
_dlt_pipeline_statelives in Snowflake_dlt_loadsrecords load history in Snowflake- a fresh job container can restore state from the destination dataset and continue incrementally
After a successful job run:
snow sql --connection mpz --role ACCOUNTADMIN -f snowflake/validate_demo.sqlThat validates:
github_issuesexists_dlt_loadsexists_dlt_pipeline_stateexists
After the containerized dlt load lands raw issues in Snowflake, dbt reads that raw table as a source and builds a single curated view:
- source:
{{ source('github_raw', 'github_issues') }} - model:
github_issues_curated - default raw schema:
github_issues_demo - default curated schema:
github_issues_analytics
The model intentionally stays small and SQL-only. It derives:
is_openissue_age_daysdays_since_updatelabel_count
Snowflake dbt Projects require a profiles.yml, but Snowflake documentation warns that deployment stops if the committed profile contains passwords.
This repo therefore commits only profiles/profiles.template.yml. Create a temporary runtime copy:
export DBT_PROJECT_NAME='github_issues_demo_project'
export DBT_SNOWFLAKE_ROLE='<role>'
export DBT_SNOWFLAKE_WAREHOUSE='DLT_GITHUB_DEMO_WH'
export DBT_DATABASE='DLT_GITHUB_DEMO'
export DBT_SOURCE_SCHEMA='github_issues_demo'
export DBT_SCHEMA='github_issues_analytics'
tmpdir="$(mktemp -d)"
cp profiles/profiles.template.yml "$tmpdir/profiles.yml"Deploy the DBT PROJECT object:
snow dbt deploy \
--connection mpz \
--database DLT_GITHUB_DEMO \
--schema GITHUB_ISSUES_ANALYTICS \
--warehouse DLT_GITHUB_DEMO_WH \
--role "$DBT_SNOWFLAKE_ROLE" \
--source . \
--profiles-dir "$tmpdir" \
--default-target dev \
"$DBT_PROJECT_NAME"Execute the curated model:
snow dbt execute \
--connection mpz \
--database DLT_GITHUB_DEMO \
--schema GITHUB_ISSUES_ANALYTICS \
--warehouse DLT_GITHUB_DEMO_WH \
--role "$DBT_SNOWFLAKE_ROLE" \
"$DBT_PROJECT_NAME" \
run --select github_issues_curatedClean up the temporary profile:
rm -rf "$tmpdir"Optional curated-model spot check:
snow sql --connection mpz --role ACCOUNTADMIN -q "select issue_id, issue_number, is_open, issue_age_days from DLT_GITHUB_DEMO.GITHUB_ISSUES_ANALYTICS.GITHUB_ISSUES_CURATED limit 20"These local checks exist to validate the repo and Python/dbt assets. They are not the supported runtime path for loading data.
uv run pytest
tmpdir="$(mktemp -d)"
cp profiles/profiles.template.yml "$tmpdir/profiles.yml"
uv run --python 3.13 dbt parse --profiles-dir "$tmpdir"
rm -rf "$tmpdir"
sh -n container/entrypoint.shIf you also want to spot-check the container image locally and already have Docker installed, you can still run:
docker buildx build --platform linux/amd64 -f container/Dockerfile -t dlthubsnow:ci --load .This repo now has three GitHub Actions workflows:
.github/workflows/manual-load.ymlvalidates the repo (uv sync,pytest,dbt parse, entrypoint syntax, CI image build).github/workflows/setup-spcs-infra.ymlruns the Snowflake setup SQL from GitHub Actions so you do not need local Docker or local Snow CLI for the one-time bootstrap.github/workflows/deploy-spcs-job.ymlbuilds the image on the GitHub runner, pushes it to the Snowflake registry, stages the job spec, and executes the job service with Snowflake-provided OAuth inside the container
That means the recommended deployment path no longer depends on local Docker.
This repo now contains the assets and docs for the Snowflake-container runtime path, but the following still depend on a live Snowflake environment:
- compute pool creation privileges
- image repository access
- a Docker-capable build environment pushing to the Snowflake registry (the GitHub-hosted runner is the default choice)
- stage upload
- secret creation
- live
EXECUTE JOB SERVICE - live
snow dbt deploy/snow dbt execute
- The
dltexecution model is now Snowpark Container Services job service only. - The local Python CLI remains in the repo because the container runs that packaged entrypoint.
- Secrets are intentionally excluded from version control.
- The dbt layer remains intentionally lightweight: one source and one curated model.