

Optimising The Performance Of Power Query Merges In Power BI, Part 4: Table.Join...
source link: https://blog.crossjoin.co.uk/2020/06/14/optimising-the-performance-of-power-query-merges-in-power-bi-part-4-table-join-and-other-join-algorithms/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Optimising The Performance Of Power Query Merges In Power BI, Part 4: Table.Join And Other Join Algorithms
In the previous post in this series I showed how you can use the Table.Join function and the SortMerge algorithm to get much better performance for merge operations in Power Query – but only if your data is sorted. But what if your data isn’t sorted? Can those other join algorithms give you better performance? The answer is… you guessed it, it depends.
As a reminder, the seven join algorithms that can be used with Table.Join are:
- JoinAlgorithm.Dynamic
- JoinAlgorithm.LeftHash
- JoinAlgorithm.LeftIndex
- JoinAlgorithm.PairwiseHash
- JoinAlgorithm.RightHash
- JoinAlgorithm.RightIndex
- JoinAlgorithm.SortMerge
The first thing to say is that if you don’t specify a join algorithm in the sixth parameter of Table.Join (it’s an optional parameter), Power Query will try to decide which algorithm to use based on some undocumented heuristics. The same thing also happens if you use JoinAlgorithm.Dynamic in the sixth parameter of Table.Join, or if you use the Table.NestedJoin function instead, which doesn’t allow you to explicitly specify an algorithm.
There are going to be some cases where you can get better performance by explicitly specifying a join algorithm instead of relying on JoinAlgorithm.Dynamic but you’ll have to do some thorough testing to prove it. From what I’ve seen there are lots of cases where explicitly setting the algorithm will result in worse performance, although there are enough cases where doing so results in better performance to make all that testing worthwhile.
For example, using the same CSV file that I’ve been using in my previous posts, I created two source queries called First and Second that only returned column A and the first 300,000 rows. Here’s the M code for First (the code for Second only differs in that it renames the only column to A2):
let
Source = Csv.Document(
File.Contents("C:\Users\chwebb\Documents\NumbersMoreColumns.csv"),
[Delimiter = ",", Columns = 7, Encoding = 65001, QuoteStyle = QuoteStyle.None]
),
#"Promoted Headers" = Table.PromoteHeaders(Source, [PromoteAllScalars = true]),
#"Removed Other Columns" = Table.SelectColumns(#"Promoted Headers", {"A"}),
#"Renamed Columns" = Table.RenameColumns(#"Removed Other Columns", {{"A", "A1"}}),
#"Kept First Rows" = Table.FirstN(#"Renamed Columns", 300000)
in
#"Kept First Rows"
Here’s a query that uses Table.Join and JoinAlgorithm.Dynamic to merge these two queries:
let
Source = Table.Join(First, {"A1"}, Second, {"A2"}, JoinKind.Inner, JoinAlgorithm.Dynamic)
in
Source
The average timings for this query on my PC were:
- Progress Report End/25 Execute SQL – 2.0 seconds
- Progress Report End/17 Read Data – 0.4 seconds
Changing this query to use JoinAlgorithm.LeftHash instead, like so:
let
Source = Table.Join(First, {"A1"}, Second, {"A2"}, JoinKind.Inner, JoinAlgorithm.LeftHash)
in
Source
…resulted in the following average timings:
- Progress Report End/25 Execute SQL – 0.9 seconds
- Progress Report End/17 Read Data – 0.6 seconds
An improvement of almost one second – but I’ve not included here all the other test results for algorithms that performed worse (I had to cancel the query that used JoinAlgorithm.LeftIndex because it was so slow). And just to be clear: I’m not saying that using JoinAlgorithm.LeftHash is always better than JoinAlgorithm.Dynamic, just that it happened to perform better in this case with these queries and this data. With different data and different queries then different algorithms may perform better. If you find other scenarios where specifying an algorithm improves performance then please let me know by leaving a comment.
Like this:
Recommend
-
18
Optimising The Performance Of Power Query Merges In Power BI, Part 5: Cross Joins In a late addition to the series of posts that I started
-
70
Understanding The “The key didn’t match any rows in the table” Error In Power Query In Power BI Or Excel One of the most common errors you’ll see when working with Power Query in Power BI or Excel is this: Expression.Er...
-
9
Technical Articles
-
6
Query folding in Power BI — tricks, lies & ultimate performance testIn the final part of the Query folding series, learn more tricks, find out when the View Native Query lies, and finally spot the difference betw...
-
7
ios Published on 3 December 2021...
-
4
This is going to be a multi-part Video + article tutorial series on JavaScript DOM. You’re reading Pa...
-
9
28 March, 2022 Optimising Largest Contentful Paint Written by Harry Roberts on CSS Wizardry. Table of Contents Largest Contentful Paint (LCP) is my favourite Core Web V...
-
9
optimising-unifi-performance NOTE: Content below is written by Adrian Mace. Cli...
-
7
5 mistakes to avoid when optimising your web app performance
-
4
Bálint Horváth 1 hour ago Table query performance improvement 26 Views
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK