Skip to content

MiguelElGallo/snowdcm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

snowdcm

This repository takes the Snowflake setup that originally lived in evsnow as imperative SQL and splits it into two parts:

  • the part Snowflake DCM can manage declaratively
  • the part Snowflake DCM still cannot manage and therefore must remain imperative

If you only look at the final files, that split might seem obvious.

But the important part is not only what moved.

It is why it moved, what had to stay behind, and how to prove that DCM really ignores the unsupported objects.

That is what this repository documents.

What problem does this solve?

The original evsnow setup created Snowflake infrastructure with imperative SQL.

That works.

But it also means the desired state lives in a long sequence of CREATE, GRANT, and ALTER statements.

For the objects Snowflake DCM supports, that is no longer necessary.

With DCM, you can declare the desired state once, run plan, run deploy, and let Snowflake compare the definition files with the current state of the DCM project.

So this repository converts the evsnow setup into:

  • DCM definitions for supported objects
  • post-deploy SQL templates for unsupported objects
  • local SNOW CLI workflow first
  • GitHub Actions workflow second

Start with the original setup

The original evsnow Snowflake setup creates these objects:

  • user: STREAMEV
  • role: STREAM
  • warehouse: COMPUTE_WH
  • database: INGESTION
  • schema: INGESTION.PUBLIC
  • database: CONTROL
  • schema: CONTROL.PUBLIC
  • control table: CONTROL.PUBLIC.INGESTION_STATUS
  • external volume: EXVOL
  • Iceberg table: INGESTION.PUBLIC.EVENTS_TABLE1
  • pipe: INGESTION.PUBLIC.EVENTS_TABLE_PIPE
  • grants across all of the above
  • RSA public key assignment for the runtime user

In this repository, those same objects are represented for two environments:

  • DEV, using the suffix _DEV
  • PROD, using no suffix

So, for example:

  • STREAM becomes STREAM_DEV in DEV
  • INGESTION.PUBLIC becomes INGESTION_DEV.PUBLIC in DEV
  • EXVOL becomes EXVOL_DEV in DEV

Now, what can DCM manage?

Snowflake DCM can manage the supported part of the setup from scratch.

In this repository, that means:

  • role
  • warehouse
  • databases
  • schemas
  • standard table for INGESTION_STATUS
  • grants on those DCM-managed objects

Those live under sources/definitions/.

Infrastructure

The file sources/definitions/infrastructure.sql defines:

  • STREAM{{ env_suffix }}
  • COMPUTE_WH{{ env_suffix }}
  • INGESTION{{ env_suffix }}
  • CONTROL{{ env_suffix }}
  • INGESTION{{ env_suffix }}.PUBLIC
  • CONTROL{{ env_suffix }}.PUBLIC

Table

The file sources/definitions/tables.sql defines the control table:

  • CONTROL{{ env_suffix }}.PUBLIC.INGESTION_STATUS

It uses a standard table on purpose.

The original evsnow docs support a hybrid-table option, but this repository keeps the DCM version compatible with the standard table path.

Grants

The file sources/definitions/grants.sql defines the grants that belong to those DCM-managed objects.

That includes:

  • database usage
  • schema usage
  • CREATE TABLE on the ingestion schema
  • SELECT, INSERT, UPDATE on the control table
  • warehouse usage

It does not include grants from the environment role to Snowflake system roles.

Those grants are administrative wiring, not stable DCM-managed state.

In practice, granting STREAM_DEV or STREAM to SYSADMIN through DCM caused repeat drift on SYSADMIN, so that grant is intentionally kept outside the DCM definitions.

What still cannot move to DCM?

Some objects from the original evsnow setup are still not supported by Snowflake DCM in this repository's design.

Those objects remain imperative:

  • user
  • external volume
  • Iceberg table
  • pipe
  • RSA public key assignment on the user
  • grants on the external volume, pipe, and Iceberg table

In concrete DEV names, that means:

  • STREAMEV_DEV
  • EXVOL_DEV
  • INGESTION_DEV.PUBLIC.EVENTS_TABLE1
  • INGESTION_DEV.PUBLIC.EVENTS_TABLE_PIPE

These stay in the SQL templates under scripts/:

  • scripts/post_deploy.sql.tmpl
  • scripts/post_deploy_grants.sql.tmpl

Why keep the unsupported objects imperative?

Because this is not just a matter of convenience.

DCM does not own those objects here.

So if you try to pretend they are part of the DCM state, you end up mixing two different control models:

  • DCM for objects Snowflake can compare and reconcile
  • imperative SQL for objects that must still be created directly

Keeping them separate makes the boundary explicit.

That is also why the grants are split.

The grants on DCM-managed objects live in sources/definitions/grants.sql.

The grants that depend on unsupported objects live in scripts/post_deploy_grants.sql.tmpl.

There is one more constraint

You might expect DCM to manage everything, including the DCM project itself.

But DCM project objects are schema-level objects stored in Snowflake.

So before DCM can manage your Snowflake objects, you still need one bootstrap step outside DCM to create the place where the DCM project will live.

This repository uses scripts/bootstrap.sql for that one-time setup.

It creates:

  • database DCM_ADMIN
  • schema DCM_ADMIN.EVSNOW
  • DCM project objects for DEV and PROD

Local setup

This repository assumes:

  • SNOW CLI connection name: mpz
  • Snowflake account identifier: <your-snowflake-account-identifier>
  • DCM project schema: DCM_ADMIN.EVSNOW

Replace the account identifier placeholder in manifest.yml before running the project.

Check the tools

snow --version
snow connection test -c mpz

DCM requires Snowflake CLI 3.16+.

Bootstrap the DCM project objects

Run the bootstrap once with a role that can create databases, schemas, and DCM projects.

snow sql -c mpz -f scripts/bootstrap.sql

Create the DCM projects if needed

snow dcm create --connection mpz --target DEV --if-not-exists
snow dcm create --connection mpz --target PROD --if-not-exists

Plan the DCM-managed state

snow dcm plan --connection mpz --target DEV --save-output

Deploy the DCM-managed state

snow dcm deploy --connection mpz --target DEV

At that point, the supported objects are managed.

The unsupported ones are still missing.

Render and apply the unsupported objects

Set the values needed by the SQL templates first:

export ENV_SUFFIX="_DEV"
export AZURE_STORAGE_ACCOUNT="replace-me"
export AZURE_STORAGE_CONTAINER="replace-me"
export AZURE_TENANT_ID="replace-me"
export RSA_PUBLIC_KEY="replace-me"

python3 scripts/render_template.py scripts/post_deploy.sql.tmpl out/post_deploy_dev.sql
python3 scripts/render_template.py scripts/post_deploy_grants.sql.tmpl out/post_deploy_grants_dev.sql

snow sql -c mpz -f out/post_deploy_dev.sql
snow sql -c mpz -f out/post_deploy_grants_dev.sql

Repeat the same flow for PROD with:

export ENV_SUFFIX=""

Azure-specific step for the external volume

The external volume is the part that needs Azure.

Creating the SQL template is not enough.

You also need the Azure-side prerequisites that Snowflake expects for an Azure external volume.

In practice, this means:

  1. create the external volume in Snowflake
  2. run DESC EXTERNAL VOLUME <name>
  3. get the AZURE_CONSENT_URL
  4. open that URL and grant tenant consent
  5. make sure the Snowflake Azure application has the required RBAC on the storage account
  6. then create or verify the Iceberg table and pipe

In the successful DEV run, the storage-side inputs followed this shape:

  • storage account: <your-storage-account>
  • container: <your-container>
  • tenant id: <your-tenant-id>

And the required Snowflake-created service principal was granted the storage-account-level Azure roles needed for the external volume flow.

What does the repository look like?

.
├── manifest.yml
├── sources/
│   └── definitions/
│       ├── grants.sql
│       ├── infrastructure.sql
│       └── tables.sql
├── scripts/
│   ├── bootstrap.sql
│   ├── render_template.py
│   ├── post_deploy.sql.tmpl
│   └── post_deploy_grants.sql.tmpl
├── docs/
│   └── evidence/
│       └── dev/
└── .github/workflows/
    ├── dcm_deploy.yml
    └── dcm_plan.yml

What did we verify locally?

It is easy to say that DCM manages the supported objects and ignores the unsupported ones.

It is better to prove it.

So the DEV environment was tested with a simple sequence:

  1. verify the unsupported objects existed
  2. run snow dcm plan
  3. drop only the unsupported objects
  4. verify they were gone
  5. run snow dcm plan
  6. run snow dcm deploy
  7. run snow dcm plan again

The result was the important part:

  • before the drop, DCM reported no changes
  • after dropping the unsupported objects, DCM still reported no changes
  • after a no-op deploy, DCM still reported no changes

That means those unsupported objects are outside the DCM-managed state.

You can inspect the evidence in docs/evidence/dev/.

Important artifacts:

  • screenshot before the drop: docs/evidence/dev/screenshots/01_imperative_objects_before.txt.png
  • plan before the drop: docs/evidence/dev/screenshots/02_plan_before_drop.txt.png
  • drop execution: docs/evidence/dev/screenshots/03_drop_imperative_objects.txt.png
  • verification after the drop: docs/evidence/dev/screenshots/04_imperative_objects_after_drop.txt.png
  • plan after the drop: docs/evidence/dev/screenshots/05_plan_after_drop.txt.png
  • deploy after the drop: docs/evidence/dev/screenshots/06_deploy_after_drop.txt.png
  • plan after the no-op deploy: docs/evidence/dev/screenshots/07_plan_after_noop_deploy.txt.png

The tracked evidence index is in docs/evidence/dev/README.md.

The terminal-capture walkthrough for the fresh DCM plan, deploy, and follow-up plan runs is in docs/evidence/dev/dcm-terminal-captures.md.

GitHub Actions

Two workflows are included.

dcm_plan.yml

This workflow runs on pull requests and manual dispatch.

It:

  • installs the Snowflake CLI
  • configures a Snowflake connection using a programmatic access token
  • ensures the DCM project exists
  • runs snow dcm plan for DEV and PROD
  • uploads plan artifacts
  • comments a summary on the pull request

dcm_deploy.yml

This workflow runs on main pushes and manual dispatch.

It:

  • deploys DEV first
  • blocks destructive drops for important DCM-managed object types
  • renders the unsupported SQL templates
  • applies the unsupported SQL after DCM deploy
  • then repeats the process for PROD

Required GitHub variables

  • SNOWFLAKE_ACCOUNT
  • SNOWFLAKE_ROLE
  • AZURE_STORAGE_ACCOUNT
  • AZURE_STORAGE_CONTAINER
  • AZURE_TENANT_ID
  • RSA_PUBLIC_KEY

Required GitHub secrets

  • SNOWFLAKE_USER
  • SNOWFLAKE_PROGRAMMATIC_ACCESS_TOKEN

Optional GitHub environments

  • DEV
  • PROD

If you use GitHub environments, the workflows can use them for approvals and scoped settings.

If you do not use them, remove the environment: keys from the workflow files.

The short version

If you want the shortest explanation, it is this:

  • evsnow started with imperative Snowflake SQL
  • this repository moved the supported objects to DCM
  • the unsupported objects stayed imperative on purpose
  • the boundary is explicit in the file layout
  • the local DEV evidence shows DCM does not recreate unsupported objects during plan or deploy

If Snowflake later adds support for users, external volumes, Iceberg tables, or pipes in DCM, the imperative templates can be retired and moved into sources/definitions/.

About

Objects for EVSNOW using DCM (Snowflake)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors