

Chris Webb's BI Blog: A Function To Visualise Parallelism In Power Query Diagnos...
source link: https://blog.crossjoin.co.uk/2020/10/11/a-function-to-visualise-parallelism-in-power-query-diagnostics-data/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

A Function To Visualise Parallelism In Power Query Diagnostics Data
Most of the time I’ve spent looking at Power Query Query Diagnostics data, I’ve been looking at data for a single query. Over the past few days though I’ve spent some time investigating what this data shows for all the queries that are executed for a single dataset refresh. To help me do this I wrote the following M function:
(InputTable as table) =>
let
Source = InputTable,
EarliestStart = List.Min(Source[Start Time]),
AddRelativeStart = Table.AddColumn(
Source,
"Relative Start",
each [Start Time] - EarliestStart
),
AddRelativeEnd = Table.AddColumn(
AddRelativeStart,
"Relative End",
each [End Time] - EarliestStart
),
SetToDurations = Table.TransformColumnTypes(
AddRelativeEnd,
{
{"Relative Start", type duration},
{"Relative End", type duration}
}
),
CalculatedTotalSeconds
= Table.TransformColumns(
SetToDurations,
{
{
"Relative Start",
Duration.TotalSeconds,
type number
},
{
"Relative End",
Duration.TotalSeconds,
type number
}
}
),
GroupedRows = Table.Group(
CalculatedTotalSeconds,
{"Id", "Query"},
{
{
"Relative Start",
each List.Min([Relative Start]),
type number
},
{
"Relative End",
each List.Max([Relative End]),
type number
}
}
),
MergeColumns = Table.AddColumn(
GroupedRows,
"Id Query",
each Text.Combine({[Id], [Query]}, " "),
type text
),
IdToNumber = Table.TransformColumnTypes(
MergeColumns,
{{"Id", type number}}
)
in
IdToNumber
If you invoke this function and pass in the “Detailed” query diagnostics data query:
…you’ll end up with a query that gives you a table that looks something like this:
This table has one row for each Power Query query that got executed while the diagnostics trace was running, an Id column to uniquely identify each execution, the name of the query executed, an Id Query column that concatenates the previous two columns, and Relative Start and Relative End columns that give you the number of seconds from the start time of the first query executed up to the start time and end time of the query on the current row.
Loading the data into Power BI allows you to build a stacked bar chart with Id Query on the axis and Relative Start and Relative End together in the values:
If you then set the Sort By Column property of Id Query to the Id column (which is numeric, so this ensures that the values in Id Query are sorted correctly) and set the Data Color property of the Relative Start values to white (or whatever the background of your chart is) so that it’s invisible, then finally set the axis of your stacked bar chart to be sorted by Id Query in ascending order, you get a visual something like this:
This is a kind of Gantt chart where the x axis shows the number of seconds since the start of the first query execution and each bar shows the start and end times of each query, making it easy to see which queries are executing in parallel.
“What can this show us?” I hear you say. Well, that’s something for a future blog post. All that I can say right now is that it’s COMPLICATED and I don’t properly understand it myself yet. But it’s undoubtedly interesting.
Like this:
Recommend
-
55
README.md Matomo (formerly Piwik) - matomo.org
-
29
README.md Mesh Spreadsheet Mesh is a JavaScript code editor that feels like a spreadsheet. Specifically, Mesh is a spreadsheet UI wrap...
-
14
Introduction When PostgreSQL processes your SQL query, it creates a query plan and then executes it. If the PostgreSQL plan includes executing parts of your query in parallel, it has the potential to speed up y...
-
38
The M Behind The New Power Query Data Types In Excel The big news this week – at least for me – was the release of the new Power Query data types to the Excel insiders channel. You can read all about it here:
-
17
Optimising The Performance Of Power Query Merges In Power BI, Part 5: Cross Joins In a late addition to the series of posts that I started
-
25
Power Query Geography And Geometry Functions In Power BI And Excel In the August 2020 release of Power BI Desktop a couple of new Power Query functions were added:
-
3
TL;DR 原文:Morsel-Driven Parallelism: A NUMA-Aware Query Evaluation Framework for the Many-Core Age
-
6
“The problem is that our true self, our brain, has changed.” via St. Martin's Press By
-
4
DirectQuery parallelism in Power BI Recently we announced an important new optimisation for DirectQuery datasets: the ability to run (some) of the queries generated by a single DAX query in parallel. You can read...
-
3
PostgreSQL 16 advances query parallelism Next major release of the major open source relational database, now in beta, brings performance improvements to...
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK