

Chris Webb's BI Blog: Optimising The Performance Of Power Query Merges In Power...
source link: https://blog.crossjoin.co.uk/2020/08/30/optimising-the-performance-of-power-query-merges-in-power-bi-part-5-cross-joins/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Optimising The Performance Of Power Query Merges In Power BI, Part 5: Cross Joins
In a late addition to the series of posts that I started here a few months ago, I thought it would be useful to look at how to optimise the performance of a rare (but nonetheless useful) type of merge: the cross join. Cross joins are not available as a join kind in the Table.Join or Table.NestedJoin M functions but there are two ways of creating them in Power Query:
- Creating a custom column in your query that contains the query that you want to cross join with and then expanding, as Matt Allington describes here
- Creating a custom column that contains a single value, such as the number 1, on both your queries and then using that column to merge the two queries on (something I mentioned a long time ago here)
Which one gives you the best performance? Apart from my old blog post there’s at least one other post suggesting the second approach is much faster, but I thought it would be useful to do some proper testing.
For my source data I created a query called SourceQuery that returned a table with one thousand rows, containing all the integers from 1 to 1000:

To test the first approach, I created a second query that referenced SourceQuery, then added a custom column that also returned the SourceQuery query:

…and then expanded this new custom column. The result was a query that returned a million rows – the cross join of the thousand-row SourceQuery table with itself.

Here’s the M code:
let
Source = SourceQuery,
#"Added Custom" = Table.AddColumn(
Source,
"Custom",
each SourceQuery
),
#"Expanded Custom" = Table.ExpandTableColumn(
#"Added Custom",
"Custom",
{"Column1"},
{"Custom.Column1"}
),
#"Changed Type" = Table.TransformColumnTypes(
#"Expanded Custom",
{{"Custom.Column1", Int64.Type}}
)
in
#"Changed Type"
The two timings from Profiler (the same ones that I have used to measure refresh performance throughout this series) were:
- Progress Report End/25 Execute SQL – 0.03 seconds
- Progress Report End/17 Read Data – 3.80 seconds
To test the second approach I added a custom column to the SourceQuery query that contained the value 1:

…and then created a new query that joined this query to itself:

…and then did an expand to achieve the same cross joined output as in the first test. Here’s the M code (all created in the Power Query Editor – no custom M code required):
let
Source = Table.NestedJoin(
#"SourceQuery With Join Column",
{"JoinColumn"},
#"SourceQuery With Join Column",
{"JoinColumn"},
"SourceQuery With Join Column",
JoinKind.Inner
),
#"Expanded SourceQuery With Join Column"
= Table.ExpandTableColumn(
Source,
"SourceQuery With Join Column",
{"Column1"},
{"SourceQuery With Join Column.Column1"}
),
#"Removed Columns" = Table.RemoveColumns(
#"Expanded SourceQuery With Join Column",
{"JoinColumn"}
)
in
#"Removed Columns"
The performance of this query was much faster:
- Progress Report End/25 Execute SQL – 0.03 seconds
- Progress Report End/17 Read Data – 0.80 seconds
So, as you can see, the second approach is the one to use.
There’s another advantage of this second approach too, if you’re using a foldable data source like SQL Server: it is possible to make query folding happen, which is of course incredibly important from a performance point of view, although you have to be careful not to change the data type of your custom column (or at least be careful how you do it). For example, here’s the M for a query that gets the cross join of the DimScenario table from the Adventure Works DW database with itself:
let
Source = Sql.Databases("localhost"),
AdventureWorksDW2017 = Source{[Name
= "AdventureWorksDW2017"]}[Data],
dbo_DimScenario = AdventureWorksDW2017{[
Schema = "dbo",
Item = "DimScenario"
]}[Data],
#"Removed Columns" = Table.RemoveColumns(
dbo_DimScenario,
{"FactFinance"}
),
#"Added Custom" = Table.AddColumn(
#"Removed Columns",
"Custom",
each 1
),
#"Merged Queries" = Table.NestedJoin(
#"Added Custom",
{"Custom"},
#"Added Custom",
{"Custom"},
"Added Custom",
JoinKind.Inner
),
#"Expanded Added Custom"
= Table.ExpandTableColumn(
#"Merged Queries",
"Added Custom",
{"ScenarioKey", "ScenarioName", "Custom"},
{
"Added Custom.ScenarioKey",
"Added Custom.ScenarioName",
"Added Custom.Custom"
}
),
#"Removed Other Columns" = Table.SelectColumns(
#"Expanded Added Custom",
{
"ScenarioName",
"Added Custom.ScenarioName"
}
)
in
#"Removed Other Columns"
Here’s the SQL code generated by this query:
select [$Outer].[ScenarioName] as [ScenarioName],
[$Inner].[ScenarioName2] as [Added Custom.ScenarioName]
from
(
select [_].[ScenarioKey] as [ScenarioKey],
[_].[ScenarioName] as [ScenarioName],
1 as [Custom]
from [dbo].[DimScenario] as [_]
) as [$Outer]
inner join
(
select [_].[ScenarioKey] as [ScenarioKey2],
[_].[ScenarioName] as [ScenarioName2],
1 as [Custom2]
from [dbo].[DimScenario] as [_]
) as [$Inner] on ([$Outer].[Custom] = [$Inner].[Custom2])
Like this:
Recommend
-
7
A Function To Visualise Parallelism In Power Query Diagnostics Data Most of the time I’ve spent looking at Power Query Query Diagnostics data, I’ve been l...
-
13
Power BI Report Performance And The Number Of Visuals On A Page When you’re faced with a slow Power BI report it’s very easy to assume that the problem is something to do with the dataset, how it’s modelled and how th...
-
38
The M Behind The New Power Query Data Types In Excel The big news this week – at least for me – was the release of the new Power Query data types to the Excel insiders channel. You can read all about it here:
-
6
Displaying Routes On A Map In Power BI In last week’s post I described the new Power Query M functions for w...
-
25
Power Query Geography And Geometry Functions In Power BI And Excel In the August 2020 release of Power BI Desktop a couple of new Power Query functions were added:
-
6
Using Small Multiples In Power BI To Improve Report Performance While the long-awaited small multiples feature...
-
8
Optimising The Performance Of Power Query Merges In Power BI, Part 4: Table.Join And Other Join Algorithms In the
-
8
Technical Articles
-
6
Ep. 48 Improve Performance and Save Money with Chris Harris, a.k.a. The Query Doctor Preview Mode Links will not work in...
-
6
ios Published on 3 December 2021...
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK