Snowflake Interview Questions
Snowflake Interview Questions
1. What is snowflake?
Snowflake is a cloud data warehouse platform built on top of the public cloud (AWS or Azure or
GCP) for storing and retrieving data. It enables data storage, processing, and analytic solutions
that are faster, easier to use, and far more flexible than traditional offerings. It’s a true software-
as-a-service (SaaS) offering. There is no hardware & software for you to install, configure, or
manage. All ongoing maintenance, management, and tuning is handled by Snowflake.
2. What is unique about snowflake cloud data warehouse?
Auto scaling, Zero copy cloning, Time travel, multi cluster virtual warehouse, all the data is
compressed, encrypted and stored in columnar format.
3. Why snowflake is 5 star cloud data warehouse?
1. True separation of storage and compute
2. Zero copy cloning
3. Data share
4. Memory cache
5. Time travel
4. PaaS (platform as a service) vs SaaS (software as a service)?
Platform as a service (PaaS) provides hardware and software tools available over the internet
and people use these tools to develop application. Example: Windows azure, google App engine
etc.
Software as a service (SaaS): Software available over the internet for users and pay fee in every
month, year etc. example: Drop box, sales force GoToMeeting, Cisco webEx etc.
Infrastructure as a service (IaaS): Could-based services, pay-as-you-go for services such as
storage, networking and virtualization. AWS, AZURE and GCP etc.
5. Snowflake query performance tuning?
1. Avoid scanning files while loading the data files into snowflake using COPY into
command: When we transferring the data from on premise system to cloud storage and
then using COPY into command to load to snowflake.
2
2. Choose a sensible virtual warehouse size: When loading large data files into snowflake,
involves scaling up to a bigger virtual warehouse to speed the data load process. Ideally
scaling up the warehouse has no performance benefit in this case.
The best approach is split (CSV) file (10GB) into multiple smaller files (100*100MB) to
make use of snowflake automatic parallel execution.
Here's a very simple approach that doesn't need any temporary tables. It will work very nicely
for small tables but might not be the best approach for large tables.
5
52. What does the compute layer (virtual warehouse layer) do in snowflake?
It will manage query execution and DML operations within elastic clusters of virtual machines.
53. What does the service layer (cloud services layer) do in snowflake?
It’s collection of services. It will manage virtual warehouses where we executed on snowflake,
metadata, transactions. This is also called as brain of the system. It will manage authentication
and access control, security, query optimization, transaction and metadata information etc.
54. What are the different ways to access the snowflake cloud data warehouses?
SnowSQL CLi
ODBC drivers
JDBC drivers
Web user interface
Python libraries
55. What are the data security features in snowflake?
Snowflake encrypts all customer data by default using End-to-end encryption (E2EE), using the
latest security standards, at no additional cost. Snowflake provides best-in-class key
management, which is entirely transparent to customers.
56. How does data compression works in snowflake?
7
All the data compressed by default in snowflake. Snowflake chooses the best compression and
it’s not configurable by the user. Snowflake will charge after data compression is applied.
57. What are the benefits of snowflake compression?
Storage costs lower than native cloud storage due to compression
No storage costs for on disk caches
Near zero storage overhead for data cloning or data sharing
58. Different types of caching in snowflake?
Query result cache, Virtual warehouse cache and metadata cache.
59. Snowflake cache layers?
1. Result cache: It stores the results of all queries executed in the past 24 hours.
2. Local disk cache or virtual warehouse: Whenever data is needed for a given query it's
retrieved from the Remote Disk storage and cached in SSD and memory of the Virtual
Warehouse. This data will remain until the virtual warehouse is active.
3. Remote disk cache: Which holds the long-term storage. This level is responsible for data
resilience, which in the case of Amazon Web Services, means 99.9% durability. Even in the
entire data center failure.
60. What is snowflake cache?
A cache is a temporary storage area.
61. Different states of snowflake virtual warehouse and difference between cold vs hot virtual
warehouse?
1. Run from cold virtual warehouse: When you all your virtual warehouses are suspended and
if you run a query, it will start a new virtual warehouse (with no local disk caching) and
executing the query.
2. Run from warm virtual warehouse: Your virtual warehouse has been active and running for a
while and has processed few queries, Then its called WARM Virtual warehouse. Now, If you
disable the result caching, and repeat the query. It will make use of the local disk caching
which it pulled in the past, which is termed as Warm Caching.
3. Run from hot virtual warehouse: you repeated the query execute, and the result caching is
switched on. The results are fully served from the cache and this is the most efficient
operation among all the three types.
62. Is there a cost associated with time travel in snowflake?
Yes, for maintaining historical data in both time travel and Fail-Safe periods.
63. Time travel vs Fail-Safe in snowflake?
If you do any table operations on snowflake table and if you want to get older version of the
table then we can use time travel feature in snowflake. What is the retention period is set for
the table that will tell you how long you can travel back.
Once the table retention period over, snowflake automatically takes table data backup. It’s
continuous backup, which is provided by snowflake. Data get stored up to 7 days in fail-safe
zone once the retention period is over. No user operations allowed. Only snowflake can help us
to retrieve those results in case of disasters.
64. What is zero copy cloning in snowflake?
Cloning also referred to as zero copy cloning. It creates a copy of database or schema or table
without duplication associated storage files on disk. Zero copy cloning saves you in storage cost.
8
66. In case I do some DDL operation on the cloned table, will that not affect my original table?
No.
67. If there are some changes to original tables data like insert/update of rows will it
automatically get reflected in the cloned data as well?
No
68. If the original table data had deleted then will the cloned, table able to see the deleted data?
Yes. The cloned table still have the original data.
69. Since zero copy cloning is a copy of the database, schema and table, do we pay for storage
twice or only one?
There is no additional storage cost for the zero copy cloning as all the data is shared from the
source version. The only additional cost would be for any changed data blocks, which are local
to the cloned copy
70. What type of tool is snowflake or is snowflake an ETL tool?
Yes. Snowflake is an ETL tool that consists of three steps.
1. Extract: data. Data files supports multiple formats like JSON, CSV, XML etc.
2. Load: data into external or internal stage.
3. Copy: Data is copied into snowflake database table using COPY INTO command.
71. How data stored in snowflake?
Snowflake store the data in multiple micro partitions, which are internally optimized and
compressed. The data stored in columnar format.
72. Snowflake is what kind of database?
All Snowflake's features are built on top of SQL (Structured Query Language) databases. The
data is stored in columns and it is compatible with other tools, including Excel and Tableau. As a
SQL database, Snowflake contains a query tool, supports multi-statement transactions, provides
role-based security, etc.
73. Snowflake editions?
1. Standard edition: It provides unlimited access to snowflake standard features.
2. Enterprise edition: Enterprise Edition provides all the features and services of Standard
Edition, with additional features designed specifically for the needs of large-scale
enterprises and organizations.
3. Business-critical edition: formally known as enterprise for sensitive data (ESD). Provides
high-level data protection to support organization with sensitive data.
4. Virtual private snowflake (VPS): Provides high-level security for organizations dealing with
financial activities.
9
OBJECT Collection of key value pairs, where the key is non-empty string and the value is a
value of variant type. Snowflake does not support currently explicitly-types object.
ARRAY Sparse arrays of arbitrary size, where index is a non-negative integer and values have
variant type. Snowflake does not support currently.
79. Snowflake variant data type?
10
Variant data type is the column datatype used for ingesting semi-structured data. It can store
values of any other type including OBJECT and ARRAY up to maximum size of 16MB compressed.
A view is the recommended approach for making a variant column accessible in a BI tool.
80. Snowflake stored procedure?
Stored procedures are similar to functions. Once the stored procedure is created we can
executed many times. A stored procedure is created with CREATE PROCEDURE command and is
executed with a CALL command.
A stored procedure returns a single value.
Benefits of stored procedure:
Procedural logic (branching & looping), which straight SQL does not support.
Error handling.
Dynamically creating a SQL statement and execute it.
One common use for stored procedures is to automate a task that requires multiple SQL
statements and performed frequently.
81. Stored procedure vs UDF (user defined functions)?
1. Stored procedure is called as an independent statement, rather than as part of statement.
2. Stored procedure return a value but not mandatory or is not required to return value.
Function required to return a value.
3. The value returned by a stored procedure cannot used directly in SQL.
The value returned by a function cannot used directly in SQL.
4. Single executable statement can call only one stored procedure. Which means single SQL
statement can call multiple functions.
Function cannot be called as part an expression
5. Snowflake provides a JavaScript API (in the form of JavaScript objects and methods). The API
enables stored procedures to execute database operations such as SELECT, UPDATE, and
CREATE.
82. What is dynamic data masking?
Dynamic Data Masking is a Column-level Security feature that uses masking policies to
selectively, mask plain-text data in table and view columns at query time.
Snowflake supports using dynamic data masking on tables and views. Dynamic data masking
privileges are APPLY and OWNERSHIP.
Apply masking policy to columns
2. Outer join: Return all rows from the left table, and the matched rows from the right
table.
12
3. Right outer join: Return all rows from the right table and matched rows from the
left table.
4. Full outer join: Return all matching and non-matching records from both the tables.
5. Cross join: Combines each row in the first table with each row in the second table. It will
omitting the join condition.
13
6. Natural join: When two tables contain columns that have the same name and in which the
data in those columns corresponds. Natural join can be combined with an outer join.
A natural join implicitly constructs
the ON clause ON table2.city = table1.city AND table2.province = table1.province.
85. How to capture all the rejected records in snowflake?
86. Calculate maximum, minimum and average salary column in employee table?
89. Difference between rank, dense rank and row number in snowflake?
Rank: If you have a duplicate value in the amount column it will give you the same ranking
number but it will skip the next ranking number. This is the default behavior of the rank
function.
Or
15
Rank function if it founds the duplicate values it will gives the same ranking number but it will
skip the next ranking number.
Dense Rank: If you have a duplicate value in the amount column it will give you the same
ranking number and it will not skip the next ranking number. This is the default behavior of the
dense rank function.
Or
Dense rank function if it founds the duplicate value or same value it will gives the same ranking
number but it will not skip the next ranking number.
16
Row Number: If you have a duplicate value in the amount column it will give you the next
ranking number. Which means it gives the straight ranking number. No matter if its gets the
duplicate values it will give the sequential ranking value. This is the default behavior of the row
number function.
Or
Row number function if it founds the duplicate values it will gives the straight ranking number.
Second way:
17
91. How to copy all the rows from one table to another table without insert statement?
When you rename the table name, old table will be deleted.
97. How to add, delete and modify a column in snowflake?
Add column
Drop column
20
Modify column
Used to generate sequential number or unique numbers across sessions and statements,
including concurrent statement. It can be used for a primary key or any column that requires a
unique value.
It is used in dimension tables, fact tables and my guess is that it can be used in start schema and
snowflake schema also.
Example:
https://docs.snowflake.com/en/user-guide/tables-clustering-micropartitions.html#query-
pruning
3. What are the snowflake objects can share?
Database, Tables, External tables, Secure views, Secure materialized views and Secure UDFs.
4. Types of stages in snowflake?
1. User stage: represented by @~. Each user has snowflake stage allocated to them by default
for storing the file. This stage is convenient option if your file will be accessed by the single
user, need to copy into multiple tables.
2. Table stage: represented by @%. Each user has snowflake stage allocated to them by default
for storing the file. This stage is convenient option if your file will be accessed by the
multiple users, need to be copy into single table.
3. Named stage: Represented by @. Internal stages are named database objects that provide
the flexibility for data loading. Because they are database objects, the security/access rules
that apply to all objects apply.
5. DDL (Data definition language) commands in SF?
Used to create, manipulate and modify objects in sf such as users, virtual warehouses,
databases, schemas, tables, views, columns, functions and stored procedures.
1. ALTER
2. COMMENT
3. CREATE
4. DESCRIBE
5. DROP
6. SHOW
7. USE
6. DML (data manipulation language) commands in SF?
General DML, data loading/unloading DML and file staging commands (for data
loading/unloading).
General DML:
1. INSERT
2. INSERT (multiple table)
3. MERGE
4. UPDATE
5. DELETE
6. TRUNCATE TABLE
Data loading/unloading DML
1. COPY INTO <table> (loading/import data)
2. COPY INTO <location> (unloading/exporting data)
3. VALIDATE (table function)
File staging commands (for data loading/unloading)
1. PUT
2. GET
3. LIST
4. REMOVE
24
Example:
27
4. SCIM: It allows the automated management of user identities and groups (i.e. roles) by
creating an interface between Snowflake and a third-party Identity Provider (IdP).
Mandatory parameters:
TYPE = SCIM (specify the type of integration. Creates a security interface between SF
and a client that supports SCIM)
SCIM_CLIENT = ‘OKTA’ | ’AZURE’ | ‘GENERIC’ (specify the SCIM client)
28
SQL scripts that will generate a date dimension in snowflake. We are using build-in generator
and SEQ-function.
Create date dimension or calendar table also known as a date scaffold or date sample.
1. Conceptual modeling: defining what data system contains, used to organize, scope, and
define business concepts and rules.
2. Logical modeling: defining how a data system should be implemented, used to develop
a technical map of rules and data structures.
3. Physical modeling: defining how the data system will be implemented according to the
specific use case.
29. Types of data warehousing?
1. Enterprise Data warehousing
2. Operational data store
3. Data mart
30. What are the approaches used by optimizer during execution plan?
1. Rule based
2. Cost based
31. What is normalization?
Splits up the data into additional tables. It helps in reducing data redundancy.
32. Why partitioning?
Partitioning is done for various reasons like easy management, enhance performance.
33. SCDs (slowly changing dimension)?
Slowly changing dimension are the dimensions in which data changes slowly, rather than
changing regularly on time basis.
34. Primary key vs unique key?
Primary key will not accept NULL values whereas unique key accepts null value.
A table can have only primary key whereas can have multiple unique keys on table.
Cluster index created automatically when you define a primary key whereas unique key
generates the non-clustered index.
35. VARIANT data type size limit in snowflake or data size limitation?
16 MB (compressed) size limit on individual rows.
We recommend to enabling STRIP_OUTER_ARRAY file format option in the COPY into
command to remove the outer array structure and load records into separate table rows.
Views are automatically updated based on the underlying table's data changing. If the
underlying tables have schema changes (column names are changed, for example), then
the view will need to be recreated. Otherwise, views are dynamically updated.
2. Materialized views: It stores a copy of the result query. It will improve the query
performance and efficiency. Zero maintenance no input required, always refresh the
data (with base table).
31
If the tables are updated frequently, then you are using a lot of compute to update the
materialized view.
3. Secure views: Secure view is designed for data privacy, i.e., limited access to sensitive
data that should not be exposed to all users of the underlying tables.