Certified Data Engineer Associate
Certified Data Engineer Associate
Answer: B
Explanation:
Unity Catalog in Databricks helps to eliminate Data Silos in an organization by having one single source of
truth data.
Question: 2 CertyIQ
Which of the following describes a scenario in which a data team will want to utilize cluster pools?
Answer: A
Explanation:
Using cluster pools reduces the cluster start up time. So in this case, the reports can be refreshed quickly and
not having to wait long for the cluster to start.
Question: 3 CertyIQ
Which of the following is hosted completely in the control plane of the classic Databricks architecture?
A. Worker node
B. JDBC data source
C. Databricks web application
D. Databricks Filesystem
E. Driver node
Answer: C
Explanation:
C. Data bricks web application.
In the classic Data bricks architecture, the control plane includes components like the Data bricks web
application, the Data bricks REST API, and the Data bricks Workspace. These components are responsible for
managing and controlling the Data bricks environment, including cluster provisioning, notebook management,
access control, and job scheduling.
The other options, such as worker nodes, JDBC data sources, Data bricks Filesystem (DBFS), and driver nodes,
are typically part of the data plane or the execution environment, which is separate from the control plane.
Worker nodes are responsible for executing tasks and computations, JDBC data sources are used to connect
to external databases, DBFS is a distributed file system for data storage, and driver nodes are responsible for
coordinating the execution of Spark jobs.
Question: 4 CertyIQ
Which of the following benefits of using the Databricks Lakehouse Platform is provided by Delta Lake?
Answer: D
Explanation:
Delta Lake is a key component of the Data bricks Lakehouse Platform that provides several benefits, and one
of the most significant benefits is its ability to support both batch and streaming workloads seamlessly. Delta
Lake allows you to process and analyze data in real-time (streaming) as well as in batch, making it a versatile
choice for various data processing needs.
While the other options may be benefits or capabilities of Data bricks or the Lakehouse Platform in general,
they are not specifically associated with Delta Lake.
Question: 5 CertyIQ
Which of the following describes the storage organization of a Delta table?
A. Delta tables are stored in a single file that contains data, history, metadata, and other attributes.
B. Delta tables store their data in a single file and all metadata in a collection of files in a separate location.
C. Delta tables are stored in a collection of files that contain data, history, metadata, and other attributes.
D. Delta tables are stored in a collection of files that contain only the data stored within the table.
E. Delta tables are stored in a single file that contains only the data stored within the table.
Answer: C
Explanation:
C. Delta tables are stored in a collection of files that contain data, history, metadata, and other attributes.
Delta tables store data in a structured manner using Parquet files, and they also maintain metadata and
transaction logs in separate directories. This organization allows for versioning, transactional capabilities, and
metadata tracking in Delta Lake.
Question: 6 CertyIQ
Which of the following code blocks will remove the rows where the value in column age is greater than 25 from the
existing Delta table my_table and save the updated table?
Answer: C
Explanation:
Question: 7 CertyIQ
A data engineer has realized that they made a mistake when making a daily update to a table. They need to use
Delta time travel to restore the table to a version that is 3 days old. However, when the data engineer attempts to
time travel to the older version, they are unable to restore the data because the data files have been deleted.
Which of the following explains why the data files are no longer present?
Answer: A
Explanation:
The VACUUM command in Delta Lake is used to clean up and remove unnecessary data files that are no
longer needed for time travel or query purposes. When you run VACUUM with certain retention settings, it can
delete older data files, which might include versions of data that are older than the specified retention period.
If the data engineer is unable to restore the table to a version that is 3 days old because the data files have
been deleted, it's likely because the VACUUM command was run on the table, removing the older data files as
part of data cleanup.
Question: 8 CertyIQ
Which of the following Git operations must be performed outside of Databricks Repos?
A. Commit
B. Pull
C. Push
D. Clone
E. Merge
Answer: E
Explanation:
MERGE is the only git operation that is listed in the options that cannot be performed with Data bricks repos.
CLONE is absolutely possible.
Clone can be done in Data bricks Repo. Merge not in Repos, need to be in Git.
https://learn.microsoft.com/en-us/azure/databricks/repos/
Question: 9 CertyIQ
Which of the following data lakehouse features results in improved data quality over a traditional data lake?
A. A data lakehouse provides storage solutions for structured and unstructured data.
B. A data lakehouse supports ACID-compliant transactions.
C. A data lakehouse allows the use of SQL queries to examine data.
D. A data lakehouse stores data in open formats.
E. A data lakehouse enables machine learning and artificial Intelligence workloads.
Answer: B
Explanation:
One of the key features of a data lake house that results in improved data quality over a traditional data lake
is its support for ACID (Atomicity, Consistency, Isolation, Durability) transactions. ACID transactions provide
data integrity and consistency guarantees, ensuring that operations on the data are reliable and that data is
not left in an inconsistent state due to failures or concurrent access.
In a traditional data lake, such transactional guarantees are often lacking, making it challenging to maintain
data quality, especially in scenarios involving multiple data writes, updates, or complex transformations. A
data lake house, by offering ACID compliance, helps maintain data quality by providing strong consistency
and reliability, which is crucial for data pipelines and analytics.
Question: 10 CertyIQ
A data engineer needs to determine whether to use the built-in Databricks Notebooks versioning or version their
project using Databricks Repos.
Which of the following is an advantage of using Databricks Repos over the Databricks Notebooks versioning?
Answer: B
Explanation:
An advantage of using Databricks Repos over the built-in Databricks Notebooks versioning is the ability to
work with multiple branches. Branching is a fundamental feature of version control systems like Git, which
Databricks Repos is built upon. It allows you to create separate branches for different tasks, features, or
experiments within your project. This separation helps in parallel development and experimentation without
affecting the main branch or the work of other team members.
Branching provides a more organized and collaborative development environment, making it easier to merge
changes and manage different development efforts. While Databricks Notebooks versioning also allows you
to track versions of notebooks, it may not provide the same level of flexibility and collaboration as branching
in Databricks Repos.
Question: 11 CertyIQ
A data engineer has left the organization. The data team needs to transfer ownership of the data engineer’s Delta
tables to a new data engineer. The new data engineer is the lead engineer on the data team.
Assuming the original data engineer no longer has access, which of the following individuals must be the one to
transfer ownership of the Delta tables in Data Explorer?
Answer: C
Explanation:
Reference:
https://www.databricks.com/blog/2022/08/26/databricks-workspace-administration-best-practices-for-
account-workspace-and-metastore-admins.html
Question: 12 CertyIQ
A data analyst has created a Delta table sales that is used by the entire data analysis team. They want help from
the data engineering team to implement a series of tests to ensure the data is clean. However, the data
engineering team uses Python for its tests rather than SQL.
Which of the following commands could the data engineering team use to access sales in PySpark?
A. SELECT * FROM sales
B. There is no way to share data between PySpark and SQL.
C. spark.sql("sales")D. spark.delta.table("sales")
E. spark.table("sales")
Answer: E
Explanation:
Reference:
https://spark.apache.org/docs/3.2.1/api/python/reference/api/pyspark.sql.SparkSession.table.html
Question: 13 CertyIQ
Which of the following commands will return the location of database customer360?
Answer: C
Explanation:
To retrieve the location of a database named "customer360" in a database management system like Hive or
Databricks, you can use the DESCRIBE DATABASE command followed by the database name. This command
will provide information about the database, including its location.
Question: 14 CertyIQ
A data engineer wants to create a new table containing the names of customers that live in France.
They have written the following command:
A senior data engineer mentions that it is organization policy to include a table property indicating that the new
table includes personally identifiable information (PII).
Which of the following lines of code fills in the above blank to successfully complete the task?
A. There is no way to indicate whether a table contains PII.
B. "COMMENT PII"
C. TBLPROPERTIES PII
D. COMMENT "Contains PII"
E. PII
Answer: D
Explanation:
Question: 15 CertyIQ
Which of the following benefits is provided by the array functions from Spark SQL?
Answer: D
Explanation:
D. An ability to work with complex, nested data ingested from JSON files.
Array functions in Spark SQL are primarily used for working with arrays and complex, nested data structures,
such as those often encountered when ingesting JSON files. These functions allow you to manipulate and
query nested arrays and structures within your data, making it easier to extract and work with specific
elements or values within complex data formats.
While some of the other options (such as option A for working with different data types) are features of Spark
SQL or SQL in general, array functions specifically excel at handling complex, nested data structures like
those found in JSON files.
Question: 16 CertyIQ
Which of the following commands can be used to write data into a Delta table while avoiding the writing of
duplicate records?
A. DROP
B. IGNORE
C. MERGE
D. APPEND
E. INSERT
Answer: C
Explanation:
C. MERGE
The MERGE command is used to write data into a Delta table while avoiding the writing of duplicate records. It
allows you to perform an "upsert" operation, which means that it will insert new records and update existing
records in the Delta table based on a specified condition. This helps maintain data integrity and avoid
duplicates when adding new data to the table.
Question: 17 CertyIQ
A data engineer needs to apply custom logic to string column city in table stores for a specific use case. In order to
apply this custom logic at scale, the data engineer wants to create a SQL user-defined function (UDF).
Which of the following code blocks creates this SQL UDF?
A.
B.
C.
D.
E.
Answer: A
Explanation:
The answer E is incorrect. A user defined function is never written as CREATE UDF. The correct way is
CREATE FUNCTION. So that leaves us with the choices A and D. Out of that, in D, there is no such thing as
RETURN CASE so the correct answer is A.
Question: 18 CertyIQ
A data analyst has a series of queries in a SQL program. The data analyst wants this program to run every day.
They only want the final query in the program to run on Sundays. They ask for help from the data engineering team
to complete this task.
Which of the following approaches could be used by the data engineering team to complete this task?
A. They could submit a feature request with Databricks to add this functionality.
B. They could wrap the queries using PySpark and use Python’s control flow system to determine when to run
the final query.
C. They could only run the entire program on Sundays.
D. They could automatically restrict access to the source table in the final query so that it is only accessible on
Sundays.
E. They could redesign the data model to separate the data used in the final query into a new table.
Answer: B
Explanation:
They could wrap the queries using PySpark and use Python’s control flow system to determine when to run
the final query.
Question: 19 CertyIQ
A data engineer runs a statement every day to copy the previous day’s sales into the table transactions. Each day’s
sales are in their own file in the location "/transactions/raw".
Today, the data engineer runs the following command to complete this task:
After running the command today, the data engineer notices that the number of records in table transactions has
not changed.
Which of the following describes why the statement might not have copied any new records into the table?
A. The format of the files to be copied were not included with the FORMAT_OPTIONS keyword.
B. The names of the files to be copied were not included with the FILES keyword.
C. The previous day’s file has already been copied into the table.
D. The PARQUET file format does not support COPY INTO.
E. The COPY INTO statement requires the table to be refreshed to view the copied rows.
Answer: C
Explanation:
The previous day’s file has already been copied into the table.
Reference:
https://docs.databricks.com/ingestion/copy-into/tutorial-notebook.html
Question: 20 CertyIQ
A data engineer needs to create a table in Databricks using data from their organization’s existing SQLite
database.
They run the following command:
Which of the following lines of code fills in the above blank to successfully complete the task?
A. org.apache.spark.sql.jdbc
B. autoloader
C. DELTA
D. sqlite
E. org.apache.spark.sql.sqlite
Answer: A
Explanation:
Question: 21 CertyIQ
A data engineering team has two tables. The first table march_transactions is a collection of all retail transactions
in the month of March. The second table april_transactions is a collection of all retail transactions in the month of
April. There are no duplicate records between the tables.
Which of the following commands should be run to create a new table all_transactions that contains all records
from march_transactions and april_transactions without duplicate records?
Explanation:
Question: 22 CertyIQ
A data engineer only wants to execute the final block of a Python program if the Python variable day_of_week is
equal to 1 and the Python variable review_period is True.
Which of the following control flow statements should the data engineer use to begin this conditionally executed
code block?
Answer: D
Explanation:
This statement will check if the variable day_of_week is equal to 1 and if the variable review_period evaluates
to a truthy value. The use of the double equal sign (==) in the comparison of day_of_week is important, as a
single equal sign (=) would be used to assign a value to the variable instead of checking its value. The use of a
single ampersand (&) instead of the keyword and is not valid syntax in Python. The use of quotes around True
in options B and C will result in a string comparison, which will not evaluate to True even if the value of
review_period is True.
Question: 23 CertyIQ
A data engineer is attempting to drop a Spark SQL table my_table. The data engineer wants to delete all table
metadata and data.
They run the following command:
Explanation:
C is the correct answer. For external tables, you need to go to the specific location using DESCRIBE
EXTERNAL TABLE command and delete all files.
Question: 24 CertyIQ
A data engineer wants to create a data entity from a couple of tables. The data entity must be used by other data
engineers in other sessions. It also must be saved to a physical location.
Which of the following data entities should the data engineer create?
A. Database
B. Function
C. View
D. Temporary view
E. Table
Answer: E
Explanation:
E. Table
To create a data entity that can be used by other data engineers in other sessions and must be saved to a
physical location, you should create a table. Tables in a database are physical storage structures that hold
data, and they can be accessed and shared by multiple users and sessions. By creating a table, you provide a
permanent and structured storage location for the data entity that can be used across different sessions and
by other users as needed.
Options like databases (A) can be used to organize tables, views (C) can provide virtual representations of
data, and temporary views (D) are temporary in nature and don't save data to a physical location. Functions (B)
are typically used for processing data or performing calculations, not for storing data.
Question: 25 CertyIQ
A data engineer is maintaining a data pipeline. Upon data ingestion, the data engineer notices that the source data
is starting to have a lower level of quality. The data engineer would like to automate the process of monitoring the
quality level.
Which of the following tools can the data engineer use to solve this problem?
A. Unity Catalog
B. Data Explorer
C. Delta Lake
D. Delta Live Tables
E. Auto Loader
Answer: D
Explanation:
D. Delta Live Tables
Delta Live Tables is a tool provided by Databricks that can help data engineers automate the monitoring of
data quality. It is designed for managing data pipelines, monitoring data quality, and automating workflows.
With Delta Live Tables, you can set up data quality checks and alerts to detect issues and anomalies in your
data as it is ingested and processed in real-time. It provides a way to ensure that the data quality meets your
desired standards and can trigger actions or notifications when issues are detected.
While the other tools mentioned may have their own purposes in a data engineering environment, Delta Live
Tables is specifically designed for data quality monitoring and automation within the Databricks ecosystem.
Question: 26 CertyIQ
A Delta Live Table pipeline includes two datasets defined using STREAMING LIVE TABLE. Three datasets are
defined against Delta Lake table sources using LIVE TABLE.
The table is configured to run in Production mode using the Continuous Pipeline Mode.
Assuming previously unprocessed data exists and all definitions are valid, what is the expected outcome after
clicking Start to update the pipeline?
A. All datasets will be updated at set intervals until the pipeline is shut down. The compute resources will
persist to allow for additional testing.
B. All datasets will be updated once and the pipeline will persist without any processing. The compute
resources will persist but go unused.
C. All datasets will be updated at set intervals until the pipeline is shut down. The compute resources will be
deployed for the update and terminated when the pipeline is stopped.
D. All datasets will be updated once and the pipeline will shut down. The compute resources will be terminated.
E. All datasets will be updated once and the pipeline will shut down. The compute resources will persist to allow
for additional testing.
Answer: C
Explanation:
All datasets will be updated at set intervals until the pipeline is shut down. The compute resources will be
deployed for the update and terminated when the pipeline is stopped.
Question: 27 CertyIQ
In order for Structured Streaming to reliably track the exact progress of the processing so that it can handle any
kind of failure by restarting and/or reprocessing, which of the following two approaches is used by Spark to record
the offset range of the data being processed in each trigger?
Answer: A
Explanation:
Question: 28 CertyIQ
Which of the following describes the relationship between Gold tables and Silver tables?
A. Gold tables are more likely to contain aggregations than Silver tables.
B. Gold tables are more likely to contain valuable data than Silver tables.
C. Gold tables are more likely to contain a less refined view of data than Silver tables.
D. Gold tables are more likely to contain more data than Silver tables.
E. Gold tables are more likely to contain truthful data than Silver tables.
Answer: A
Explanation:
A. Gold tables are more likely to contain aggregations than Silver tables.
In some data processing pipelines, especially those following a typical "Bronze-Silver-Gold" data lakehouse
architecture, Silver tables are often considered a more refined version of the raw or Bronze data. Silver tables
may include data cleansing, schema enforcement, and some initial transformations.
Gold tables, on the other hand, typically represent a stage where data is further enriched, aggregated, and
processed to provide valuable insights for analytical purposes. This could indeed involve more aggregations
compared to Silver tables.
Question: 29 CertyIQ
Which of the following describes the relationship between Bronze tables and raw data?
Answer: E
Explanation:
In a typical data processing pipeline following a "Bronze-Silver-Gold" data lakehouse architecture, Bronze
tables are the initial stage where raw data is ingested and transformed into a structured format with a schema
applied. The schema provides structure and meaning to the raw data, making it more usable and accessible
for downstream processing.
Therefore, Bronze tables contain the raw data but in a structured and schema-enforced format, which makes
them distinct from the unprocessed, unstructured raw data files.
Question: 30 CertyIQ
Which of the following tools is used by Auto Loader process data incrementally?
A. Checkpointing
B. Spark Structured Streaming
C. Data Explorer
D. Unity Catalog
E. Databricks SQL
Answer: B
Explanation:
The Auto Loader process in Data bricks is typically used in conjunction with Spark Structured Streaming to
process data incrementally. Spark Structured Streaming is a real-time data processing framework that allows
you to process data streams incrementally as new data arrives. The Auto Loader is a feature in Data bricks
that works with Structured Streaming to automatically detect and process new data files as they are added to
a specified data source location. It allows for incremental data processing without the need for manual
intervention.
Question: 31 CertyIQ
A data engineer has configured a Structured Streaming job to read from a table, manipulate the data, and then
perform a streaming write into a new table.
The cade block used by the data engineer is below:
If the data engineer only wants the query to execute a micro-batch to process data every 5 seconds, which of the
following lines of code should the data engineer use to fill in the blank?
A. trigger("5 seconds")
B. trigger()
C. trigger(once="5 seconds")
D. trigger(processingTime="5 seconds")
E. trigger(continuous="5 seconds")
Answer: D
Explanation:
trigger(processingTime="5 seconds")
Question: 32 CertyIQ
A dataset has been defined using Delta Live Tables and includes an expectations clause:
CONSTRAINT valid_timestamp EXPECT (timestamp > '2020-01-01') ON VIOLATION DROP ROW
What is the expected behavior when a batch of data containing data that violates these constraints is processed?
A. Records that violate the expectation are dropped from the target dataset and loaded into a quarantine table.
B. Records that violate the expectation are added to the target dataset and flagged as invalid in a field added
to the target dataset.
C. Records that violate the expectation are dropped from the target dataset and recorded as invalid in the event
log.
D. Records that violate the expectation are added to the target dataset and recorded as invalid in the event log.
E. Records that violate the expectation cause the job to fail.
Answer: C
Explanation:
C. Records that violate the expectation are dropped from the target dataset and recorded as invalid in the
event log.
Invalid rows will be dropped as requested by the constraint and flagged as such in log files. If you need a
quarantine table, you'll have to write more code.
With the defined constraint and expectation clause, when a batch of data is processed, any records that
violate the expectation (in this case, where the timestamp is not greater than '2020-01-01') will be dropped
from the target dataset. These dropped records will also be recorded as invalid in the event log, allowing for
auditing and tracking of the data quality issues without causing the entire job to fail.
Reference:
https://docs.databricks.com/en/delta-live-tables/expectations.html
Question: 33 CertyIQ
Which of the following describes when to use the CREATE STREAMING LIVE TABLE (formerly CREATE
INCREMENTAL LIVE TABLE) syntax over the CREATE LIVE TABLE syntax when creating Delta Live Tables (DLT)
tables using SQL?
A. CREATE STREAMING LIVE TABLE should be used when the subsequent step in the DLT pipeline is static.
B. CREATE STREAMING LIVE TABLE should be used when data needs to be processed incrementally.
C. CREATE STREAMING LIVE TABLE is redundant for DLT and it does not need to be used.
D. CREATE STREAMING LIVE TABLE should be used when data needs to be processed through complicated
aggregations.
E. CREATE STREAMING LIVE TABLE should be used when the previous step in the DLT pipeline is static.
Answer: B
Explanation:
B. CREATE STREAMING LIVE TABLE should be used when data needs to be processed incrementally. The
CREATE STREAMING LIVE TABLE syntax is used to create tables that read data incrementally, while the
CREATE LIVE TABLE syntax is used to create tables that read data in batch mode. Delta Live Tables support
both streaming and batch modes of processing data. When the data is streamed and needs to be processed
incrementally, CREATE STREAMING LIVE TABLE should be used.
Question: 34 CertyIQ
A data engineer is designing a data pipeline. The source system generates files in a shared directory that is also
used by other processes. As a result, the files should be kept as is and will accumulate in the directory. The data
engineer needs to identify which files are new since the previous run in the pipeline, and set up the pipeline to only
ingest those new files with each run.
Which of the following tools can the data engineer use to solve this problem?
A. Unity Catalog
B. Delta Lake
C. Databricks SQL
D. Data Explorer
E. Auto Loader
Answer: E
Explanation:
Auto Loader incrementally and efficiently processes new data files as they arrive in cloud storage without any
additional setup.
Reference:
https://docs.databricks.com/en/ingestion/auto-loader/index.html
Question: 35 CertyIQ
Which of the following Structured Streaming queries is performing a hop from a Silver table to a Gold table?
A.
B.
C.
D.
E.
Answer: E
Explanation:
E is the right answer. The "gold layer" is used to store aggregated clean data, E is the only answer in which
aggregation is performed.
Question: 36 CertyIQ
A data engineer has three tables in a Delta Live Tables (DLT) pipeline. They have configured the pipeline to drop
invalid records at each table. They notice that some data is being dropped due to quality concerns at some point in
the DLT pipeline. They would like to determine at which table in their pipeline the data is being dropped.
Which of the following approaches can the data engineer take to identify the table that is dropping the records?
A. They can set up separate expectations for each table when developing their DLT pipeline.
B. They cannot determine which table is dropping the records.
C. They can set up DLT to notify them via email when records are dropped.
D. They can navigate to the DLT pipeline page, click on each table, and view the data quality statistics.
E. They can navigate to the DLT pipeline page, click on the “Error” button, and review the present errors.
Answer: D
Explanation:
D. They can navigate to the DLT pipeline page, click on each table, and view the data quality statistics.
To identify the table in a Delta Live Tables (DLT) pipeline where data is being dropped due to quality concerns,
the data engineer can navigate to the DLT pipeline page, click on each table in the pipeline, and view the data
quality statistics. These statistics often include information about records dropped, violations of expectations,
and other data quality metrics. By examining the data quality statistics for each table in the pipeline, the data
engineer can determine at which table the data is being dropped.
Question: 37 CertyIQ
A data engineer has a single-task Job that runs each morning before they begin working. After identifying an
upstream data issue, they need to set up another task to run a new notebook prior to the original task.
Which of the following approaches can the data engineer use to set up the new task?
A.They can clone the existing task in the existing Job and update it to run the new notebook.
B.They can create a new task in the existing Job and then add it as a dependency of the original task.
C.They can create a new task in the existing Job and then add the original task as a dependency of the new
task.
D.They can create a new job from scratch and add both tasks to run concurrently.
E.They can clone the existing task to a new Job and then edit it to run the new notebook.
Answer: B
Explanation:
B. They can create a new task in the existing Job and then add it as a dependency of the original task. Adding
a new task as a dependency to an existing task in the same Job allows the new task to run before the original
task is executed. This ensures that the data engineer can run the new notebook prior to the original task
without having to create a new Job from scratch. Cloning the existing task or creating a new Job would add
unnecessary complexity to the pipeline.
Question: 38 CertyIQ
An engineering manager wants to monitor the performance of a recent project using a Databricks SQL query. For
the first week following the project’s release, the manager wants the query results to be updated every minute.
However, the manager is concerned that the compute resources used for the query will be left running and cost
the organization a lot of money beyond the first week of the project’s release.
Which of the following approaches can the engineering team use to ensure the query does not cost the
organization any money beyond the first week of the project’s release?
A.They can set a limit to the number of DBUs that are consumed by the SQL Endpoint.
B.They can set the query’s refresh schedule to end after a certain number of refreshes.
C.They cannot ensure the query does not cost the organization money beyond the first week of the project’s
release.
D.They can set a limit to the number of individuals that are able to manage the query’s refresh schedule.
E.They can set the query’s refresh schedule to end on a certain date in the query scheduler.
Answer: E
Explanation:
The correct answer is E. They can set the query's refresh schedule to end on a certain date in the query
scheduler.Databricks SQL supports a query scheduler that enables users to schedule SQL queries to run at
defined intervals. By default, scheduled queries run indefinitely. However, users can configure the scheduler
to stop running queries at a specific time or after a specific number of runs. In this scenario, the engineering
team can set the query's refresh schedule to end on a certain date, ensuring that the query does not run
beyond the first week of the project's release and potentially cost the organization more money.
Question: 39 CertyIQ
A data analysis team has noticed that their Databricks SQL queries are running too slowly when connected to their
always-on SQL endpoint. They claim that this issue is present when many members of the team are running small
queries simultaneously. They ask the data engineering team for help. The data engineering team notices that each
of the team’s queries uses the same SQL endpoint.
Which of the following approaches can the data engineering team use to improve the latency of the team’s
queries?
Answer: B
Explanation:
They can increase the maximum bound of the SQL endpoint’s scaling range.
Thank you
Thank you for being so interested in the premium exam material.
I'm glad to hear that you found it informative and helpful.
But Wait
I wanted to let you know that there is more content available in the full version.
The full paper contains additional sections and information that you may find helpful,
and I encourage you to download it to get a more comprehensive and detailed view of
all the subject matter.