10/17/22, 11:23 PM OneNote
PracticeEx
am-
PracticeEx Real
am- Databricks
Flow of users in a website Where caches is stored?
What kind of visualization should be used
A query is taking data from cache or not how to check
Choropleth map is good options
Not Sankey 1. From endpoints or warehouse
Cohorts is the answer 2. From query history
Databricks SQL UI caching: Per user caching of all query and dashboard results in
the Databricks SQL UI.
During Public Preview, the default behavior for queries and query results is that
both the queries results are cached forever and are located within your
Databricks filesystem in your account. You can delete query results by re-
running the query that you no longer want to be stored. Once re-run, the old
query results are removed from cache.
Query results caching: Per cluster caching of query results for all queries
through SQL warehouses.
To disable query result caching, you can run SET use_cached_result = false in the
SQL editor.
If Query profile is not available is displayed, no profile is available for this
query. A query profile is not available for queries that run from the query
cache. To circumvent the query cache, make a trivial change to the query,
such as changing or removing the LIMIT
By default which visualization is selected??
Q) Transfer ownership of a dashboard
If a dashboard’s owner is removed from a workspace, the dashboard no longer
has an owner, and only an admin user can manage the dashboard’s
permissions.
An admin user can transfer ownership of any dashboard,
• -that means non-admins cannot??
including one without an owner, to a different user. To transfer ownership by
using the Databricks SQL UI:
1. Open the dashboard.
2. Click Share.
3. Click Assign new owner.
4. Select the new user you’d like to make the owner from the dropdown and
click Confirm.
If the dashboard previously had an owner, that user no longer has the Can
Manage permission on the dashboard. The user you gave the Can Manage
permission is now the owner.
https://onedrive.live.com/redir?resid=E1CDE1DB50DF2737%21130&page=Edit&wd=target%28Corporate thoughts.one%7C21308d76-3c50-4740-96db-b025f0563c6e%2FTuring SQLTest%7C5d1a938… 7/11
10/17/22, 11:23 PM OneNote
Last mile etl Create view syntax using or as
Ad hoc improvement?
Last mile dashboarding CREATE TEMPORARY VIEW subscribed_movies
AS
Gold layer table is there..
One table is added SELECT mo.member_id, mb.full_name, mo.movie_title
Or
FROM movies AS mo
Some transformation needs to be done
INNER JOIN
What this is called members AS mb
ON mo.member_id = mb.id;
Group by
Partition by syntax Drop table syntax
Percent rank is there or not DROP TABLE userdb.employeetable;
percent_rank ranking window function (Databricks SQL) | Databricks on AWS
As it was starting from 0 I guess percent rank is the Ans
Left semi join
Left anti join difference
Does databricks support these -- yes
[ LEFT ] SEMI
Returns values from the left side of the relation that has a match with
the right. It is also referred to as a left semi join.
○ [ LEFT ] ANTI
Returns values from the left relation that has no match with the right.
It is also referred to as a left anti join.
Databricks sql support ansi sql
What is the advantage?
1. Faster
2. More customisation
Used for a variety of tasks, such as querying data, controlling access to the
database and its objects, guaranteeing database consistency, updating rows in a
table, and creating, replacing, altering and dropping objects, SQL lets users work
with data at the logical level.0
Dashboard refresh interval Dashboards do not support which of the following options
1min – 1 week by default 1. Borders
2. Customize tooltips
3. Customize labels
Edit widgets
https://onedrive.live.com/redir?resid=E1CDE1DB50DF2737%21130&page=Edit&wd=target%28Corporate thoughts.one%7C21308d76-3c50-4740-96db-b025f0563c6e%2FTuring SQLTest%7C5d1a938… 8/11
10/17/22, 11:23 PM OneNote
Advanced
The report will be emailed to subscribers every time it is updated.
Add a query param to a dashboard how it will impact?
1. All the dashboards
2. Only that visuals
https://docs.databricks.com/sql/user/queries/query-parameters.html
Who use databricks sql as secondary use? Query is scheduled 4 hours interval
1. Business intelligence analyst But the endpoints is taking time to start
2. Business analyst What should be done while managing costs
3. Data analyst 1. Increase the cluster size
4. Data engineering 2. Decrease the cluster size
https://onedrive.live.com/redir?resid=E1CDE1DB50DF2737%21130&page=Edit&wd=target%28Corporate thoughts.one%7C21308d76-3c50-4740-96db-b025f0563c6e%2FTuring SQLTest%7C5d1a938… 9/11
10/17/22, 11:23 PM OneNote
3. Scale down
Top 5 Databricks Performance Tips - How to Speed Up Your Workloads - The
Databricks Blog
1. Use larger clusters. It may sound obvious, but this is the number one
problem we see. It’s actually not any more expensive to use a large cluster
for a workload than it is to use a smaller one. It’s just faster. If there’s
anything you should take away from this article, it’s this. Read section 1.
Really.
2. Use Photon, Databricks’ new, super-fast execution engine. Read section 2
to learn more. You won’t regret it.
3. Clean out your configurations. Configurations carried from one Apache
Spark™ version to the next can cause massive problems. Clean up! Read
section 3 to learn more.
4. Use Delta Caching. There’s a good chance you’re not using caching
correctly, if at all. See Section 4 to learn more.
5. Be aware of lazy evaluation. If this doesn’t mean anything to you and
you’re writing Spark code, jump to section 5.
6. Bonus tip! Table design is super important. We’ll go into this in a future
blog, but for now, check out the guide on Delta Lake best practices.
Every minute data refresh from steaming dataset Insert into syntax
What should analyst say as a concern
Options
1. Streaming dataset doesn't support fault tolerance 1. Wrong syntax – syntax was correct
2. It will be costly 2. Append the data including duplicates
3.
INSERT { OVERWRITE | INTO } [ TABLE ] table_name
[ PARTITION clause ]
[ ( column_name [, ...] ) ]
query
> INSERT INTO students TABLE visiting_students;
Q) Fivetran connect with databricks
Fivetran automated data integration adapts as schemas and APIs change,
ensuring reliable data access and simplified analysis with ready-to-query
schemas.
You can integrate your Databricks SQL warehouses (formerly Databricks SQL
endpoints) and Databricks clusters with Fivetran.
The Fivetran integration with Databricks helps you centralize data from
disparate data sources into Delta Lake.
Note
Partner Connect does not integrate Fivetran with Databricks clusters. To
integrate a cluster with Fivetran, connect to Fivetran manually.
https://onedrive.live.com/redir?resid=E1CDE1DB50DF2737%21130&page=Edit&wd=target%28Corporate thoughts.one%7C21308d76-3c50-4740-96db-b025f0563c6e%2FTuring SQLTest%7C5d1a93… 10/11
10/17/22, 11:23 PM OneNote
https://onedrive.live.com/redir?resid=E1CDE1DB50DF2737%21130&page=Edit&wd=target%28Corporate thoughts.one%7C21308d76-3c50-4740-96db-b025f0563c6e%2FTuring SQLTest%7C5d1a93… 11/11