Basics of Power Query in MS Excel, Power BI
Q1. Explain the concept of data shaping and modelling in PQ.
A1. Data Shaping (Power Query):
Focus:
o Data shaping is primarily concerned with transforming the structure and format of your
data. It's about cleaning, organizing, and preparing your data for analysis.
o This occurs within the Power Query Editor.
Key Actions:
o Cleaning: Removing errors, inconsistencies, and unwanted data (e.g., blank rows, duplicate
entries).
o Transforming: Changing data types (e.g., text to numbers, dates), splitting columns,
merging columns, and pivoting/unpivoting data.
o Structuring: Reshaping the data to make it consistent and suitable for analysis.
o Appending and Merging: Combining data from different sources.
Purpose:
o To get your data into a clean, consistent, and usable format.
o To ensure that your data is structured in a way that allows for effective analysis.
Data Modelling (Power BI):
Focus:
o Data modelling involves defining the relationships between different tables of data.
o It's about creating a structured framework that allows you to analyse data from multiple
sources in a cohesive way.
o This occurs within the model view of Power BI.
Key Actions:
o Establishing Relationships: Defining how different tables are connected (e.g., one-to-
many, many-to-many).
o Creating Measures: Writing DAX (Data Analysis Expressions) formulas to perform
calculations and derive insights.
o Defining Table Structures: Optimizing the table structure for performance and analysis
(e.g., using star schemas).
Purpose:
o To create a robust and efficient data structure that enables you to perform complex analyses
and gain meaningful insights.
o To allow for data to be used in a relational way, so that data from separate tables can be
used together.
In essence:
Data shaping prepares the individual tables.
Data modelling connects those tables and defines how they interact.
Therefore, data shaping is often a precursor to data modelling. You first shape your data to ensure it's
clean and consistent, and then you model it to create relationships and enable comprehensive analysis.
Q2. How can you create custom columns, group data in PQ?
A2. Creating Custom Columns:
Power Query's "Add Column" tab is where you'll find the tools for this. Here's a general process:
Open Power Query Editor:
o In Excel or Power BI, access the Power Query Editor.
Go to "Add Column":
o Find the "Add Column" tab on the ribbon.
Choose "Custom Column":
o Click the "Custom Column" button. This will open the "Custom Column" dialog box.
Write Your Formula:
o In the dialog box, you'll:
Give your new column a name.
Write a formula using Power Query's M language. You can use existing columns,
operators, and functions.
Power Query will check your formula for syntax errors.
o Example: If you have columns "Quantity" and "Price," you could create a "Total Sales"
column with the formula [Quantity] * [Price].
Click "OK":
o The new column will be added to your table.
Grouping Data:
Grouping data allows you to summarize information based on the values in one or more columns. Here's
how:
Select Columns:
o In the Power Query Editor, select the column(s) you want to group by.
Use "Group By":
o You can find the "Group By" button on the "Home" or "Transform" tab.
o Clicking "Group By" opens the "Group By" dialog box.
Configure Grouping:
o In the dialog box, you can:
Choose the columns to group by.
Specify the new column name for the grouped data.
Select an aggregation operation (e.g., sum, average, count).
You can also use the advanced feature to group by multiple columns, and add
multiple aggregations.
Click "OK":
o Power Query will group the rows based on your settings.
Q3. What are some base practices for using PQ to efficiently clean and transfer data?
A3. 1. Connect to the Right Data Source:
Use Native Connectors:
o Power Query offers a wide range of connectors. Prioritize using the specific connector
designed for your data source (e.g., SQL Server connector for SQL Server databases) over
generic connectors like ODBC. This often leads to better performance and access to source-
specific optimizations.
Target Specific Data:
o Instead of importing entire databases or files, use query options or filters within the
connector to retrieve only the necessary data. This reduces the amount of data Power Query
has to process.
2. Optimize Data Transformation:
Filter Early:
o Apply filters as early as possible in your query. This minimizes the amount of data that
subsequent steps need to process. If your data source supports "query folding" then this is
very important.
Set Correct Data Types:
o Ensure that columns have the correct data types (e.g., numbers, dates, text). This improves
performance and prevents errors in later analysis.
Remove Unnecessary Columns and Rows:
o Eliminate any columns or rows that are not needed for your analysis. This reduces the size
of your data and improves performance.
Perform Grouping and Aggregation Early:
o If you need to summarize data, perform grouping and aggregation operations as early as
possible. This reduces the number of rows that need to be processed in later steps.
Use "Keep Rows" and "Remove Rows" Effectively:
o These functions are very useful for cleaning up rows that have errors, or are not necessary.
Use the trim function:
o Extra spaces in text-based data can cause issues. Use the trim function to remove those
spaces.
3. Query Folding:
Understand Query Folding:
o Query folding is a key optimization feature. It allows Power Query to translate
transformation steps into native queries that are executed by the data source itself. This
significantly improves performance, especially with large datasets.
Design Queries for Folding:
o Structure your queries in a way that maximizes query folding. Not all transformations can be
folded, so it's important to be aware of which steps support it.
4. Modular Approach:
Break Down Complex Queries:
o Divide complex transformations into smaller, more manageable queries. This makes it easier
to troubleshoot and maintain your queries.
Use Functions:
o When you have to repeat the same steps, create functions. This will make your work much
more efficient.
5. Error Handling:
Implement Error Handling:
o Use Power Query's error handling capabilities to gracefully handle errors in your data. This
prevents your queries from failing and allows you to identify and correct data quality issues.
Review Query Steps:
o When an error occurs, review each query step to find where the error is occurring.
Q4. How can you optimize PQ for better performance?
A4. 1. Leverage Query Folding:
Understand Its Importance:
o Query folding pushes data transformation operations back to the data source (e.g., SQL
Server), allowing the source to perform the work, which is typically much faster.
Design for Folding:
o Prioritize transformations that support folding (e.g., filtering, sorting, basic calculations).
o Avoid transformations that break folding (e.g., certain custom functions, complex conditional
logic).
o Test to see if Query folding is occurring. You can do this by right clicking on a step, and
seeing if "view native query" is greyed out or not. If it is not greyed out, query folding is
occurring.
Filter Early:
o Place filter steps as early as possible in your query to reduce the amount of data processed.
2. Data Source Optimization:
Use Native Connectors:
o Prefer specific connectors (e.g., SQL Server connector) over generic ones (e.g., ODBC) for
better performance.
Retrieve Only Necessary Data:
o If possible, use database views or stored procedures to pre-filter and aggregate data at the
source.
o Limit the number of columns and rows retrieved.
3. Transformation Efficiency:
Set Data Types Early:
o Define data types as soon as possible to avoid implicit conversions, which can slow down
processing.
Remove Unnecessary Columns/Rows:
o Eliminate columns and rows that are not needed for analysis to reduce data size.
Optimize Custom Columns:
o Use efficient M language expressions.
o Avoid complex calculations that can be performed at the data source.
Group and Aggregate Early:
o Perform grouping and aggregation operations as early as possible to reduce the number of
rows.
Use Indexed Columns:
o If merging large tables, make sure the columns that are being merged on are indexed.
Disable Background Data Preview:
o When working with very large datasets, disable background data preview in the Power
Query editor options. This prevents Power Query from attempting to load and display large
amounts of data while you are working on your query.
4. Query Structure and Organization:
Break Down Complex Queries:
o Divide large, complex queries into smaller, more manageable queries.
o Use functions to encapsulate reusable transformation logic.
Disable Step Preview:
o Disabling the step preview can stop PQ from attempting to render a preview of every step,
which can save a lot of processing power.
Reduce the number of steps:
o Combining steps, when possible, can reduce the overhead of PQ.
5. Data Loading and Storage:
Optimize Data Loading:
o If loading data into Excel, consider using the data model to reduce file size.
o If loading data into Power BI, optimize the data model for performance.
Data Compression:
o Power BI has very good compression, which helps with performance.
Q5. What is power bi desktop?
A5. Power BI Desktop is a free desktop application from Microsoft that allows you to connect to, transform,
and visualize data. Essentially, it's the tool you use to create Power BI reports. Here's a breakdown of its
key functions:
Data Connection:
o Power BI Desktop can connect to a wide range of data sources, including databases,
spreadsheets, cloud services, and more.
Data Transformation:
o It includes Power Query Editor, which enables you to clean, shape, and transform your data
before analysing it. This involves tasks like removing errors, formatting data, and combining
data from different sources.
Data Modelling:
o You can create data models by establishing relationships between different tables, creating
calculated columns, and defining measures using DAX (Data Analysis Expressions).
Data Visualization:
o Power BI Desktop provides a rich set of visualization tools to create interactive charts,
graphs, and other visual representations of your data.
Report Creation:
o You can design reports by arranging visualizations on report pages, adding filters, and
creating interactive elements.
Report Sharing:
o Once you've created a report, you can publish it to the Power BI service, where it can be
shared with others.
In simple terms, Power BI Desktop is where you build your Power BI reports. Then, the Power BI service is
where you share and distribute those reports.
Q6.