24

Calculating Quartiles with DAX and Power BI

 5 years ago
source link: https://www.tuicool.com/articles/hit/zi2yYb3
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Like many of my blog posts, this post is inspired by some work I was doing with a customer who had a requirement to calculate quartiles of a ranked set of data. So I thought I’d take a few minutes to demonstrate how I accomplished this using DAX in Power BI. For this example, I’m going to be using the NFL dataset that I download as part of my NFL analysis dashboards I published last week .

e222ia6.png!web

Download the Power BI 2018 NFL Stats and Analysis Report Dashboard here

So for this example I’m going to calculate the quartiles for NFL football players based on yards rushing.

ny2EnyV.png!web

To calculate the quartile, we’re going to use the PERCENTILEX.INC DAX function . The PERCENTILEX.INC function returns the number at the specified percentile. So for example, if I had numbers 0 and 100 in my data set, the 25th percentile value would be 25. The 50th percentile value would be 50 and the 75th percentile value would be 75, and you can figure out what the 100th percentile value would be.

yeuMZji.png!web

If I want to break my rushers into quarters so I can understand how they compare to the rest of the rushers in the NFL at their position in a given season, calculating the quartiles could be a good way to do that.

First, I create a calculated column. And I want to write an expression that calculates the 75th percentile value of rushing yards, which is also the upper third quarter boundary.

Quartile Rank by Rushing Yards = PERCENTILEX.INC(‘Stats’,[Rush-Yds],.75)

This provides me with my 3rd quartile upper boundary.

JnIJruy.png!web

So I can tell if a player has more than 43 rushing yards, they’re in the upper quartile.

There’s a problem though. My dataset contains rushing stats for multiple years, so I’d like to calculate quartiles within the context of a single season. Also, it doesn’t make sense to compare the rushing stats of running backs to quarter back or wide receivers or tight ends. So I need to edit my calculation to only calculate the quartile for the current players position and season. To accomplish this I need to use a Filter function and the Earlier function as seen here.

Quartile Rank by Rushing Yards = PERCENTILEX.INC(
FILTER(
FILTER(‘Stats’,
[Season] = EARLIER([Season])
),
[Position] = EARLIER([Position])
)
,[Rush-Yds],.75)

The Filter function allows me to filter the Stats tables to a subset of records based on a condition and the Earlier function allows me to limit the subset of records returned from the Stats table to the current row’s season and position. I’m also filtering out players without any rushing yards

This calculation gives me the third quarter boundary. I could also create another calculation for the second quarter boundary using the following code:

PERCENTILEX.INC (
FILTER (
FILTER (
FILTER ( ‘Stats’, ISBLANK(‘Stats'[Rush-Yds]) = FALSE()),
[Season] = EARLIER ( [Season] )
),
[Position] = EARLIER ( [Position] )
),
[Rush-Yds],
.75
)

But this would require me to create multiple columns: one for the Q1 boundary, Q2 boundary, and Q3 boundary. Instead of doing that, I’ll just use a variable in DAX . I can use the variables to store the values for the different boundaries and then use the IF function to do the comparison to calculate which quarter the player is in.

Quartile Rank by Rushing Yards =
var high = PERCENTILEX.INC(
FILTER(
FILTER(
FILTER(
‘Stats’,
ISBLANK(‘Stats'[Rush-Yds]) = FALSE()
),
[Season] = EARLIER([Season])
),
[Position] = EARLIER([Position])
)
,[Rush-Yds],.75)
var mid = PERCENTILEX.INC(
FILTER(
FILTER(
FILTER(‘Stats’,ISBLANK(‘Stats'[Rush-Yds]) = FALSE()),
[Season] = EARLIER([Season])
),
[Position] = EARLIER([Position])
)
,[Rush-Yds],.5)
var low = PERCENTILEX.INC(
FILTER(
FILTER(
FILTER(‘Stats’,ISBLANK(‘Stats'[Rush-Yds]) = FALSE()),
[Season] = EARLIER([Season])
),
[Position] = EARLIER([Position])
)
,[Rush-Yds],.25)
return
IF(‘Stats'[Rush-Yds] <= low, 4, IF(‘Stats'[Rush-Yds] <= mid, 3, IF(‘Stats'[Rush-Yds] <= high, 2, 1)))

Using this method, I can tell which quartile each player is in within the context of a single season and position group. And then the fun with the charts begin.

AfyaiuB.png!web

If you’d like to download my Power BI Desktop file, you can get that here .

I hope you found that useful! Leave a comment and let me know what you thought!


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK