Performance Problems With MDX Calculated Measures That Return Constants In Analy...

Recently I was working on an MDX calculation that calculated a value at a low granularity and aggregated the result – basically what I described in myblog post here. Here’s a simplified version of the calculation written on a cube built from Adventure Works data running on my local SSAS 2017 MD instance:

SCOPE([Measures].[Tax Amount]);
    SCOPE(
	[Customer].[Customer].[Customer].MEMBERS, 
	[Date].[Date].[Date].MEMBERS, 
	[Product].[Product].[Product].MEMBERS);
        THIS = [Measures].[Sales Amount]*0.08;
    END SCOPE;
END SCOPE;

All it does is take the value of the Sales Amount measure at the lowest granularities of the Customer, Date and Product dimensions, multiply it by 0.08 to find a tax value, and because [Tax Amount] is a real, non-calculated measure, the result of the calculation aggregates up through the cube. [I know that I don’t have to aggregate the result of this specific calculation but remember that this is a simplified example – in the real case I did have to write the calculation using Scope statements – and anyway the best way of handling a basic multiplication like this would be with ameasure expression]

The performance was sub-second for my test query and I was happy, but then I realised that the same tax rate was being used in other calculations and may change in the future, so I thought I would store the value 0.08 in a calculated measure:

CREATE MEMBER CURRENTCUBE.[Measures].[Tax Rate] AS  
0.08;

SCOPE([Measures].[Tax Amount]);
    SCOPE(
	[Customer].[Customer].[Customer].MEMBERS, 
	[Date].[Date].[Date].MEMBERS, 
	[Product].[Product].[Product].MEMBERS);
        THIS = [Measures].[Sales Amount]*[Measures].[Tax Rate];
    END SCOPE;
END SCOPE;

Surely a simple change? But no, as soon as I did this my query ran for several minutes and memory usage went through the roof until the query was automatically cancelled:

A7ziEnU.png!web

Clearly the SSAS MD Formula Engine could optimise the version with the hard-coded constant value but could not optimise the version with the calculated measure. There was nothing in Profiler to indicate the calculation was being evaluated in cell-by-cell mode though.

So I tried another variation:

CREATE MEMBER CURRENTCUBE.[Measures].[Tax Rate] AS  
IIF([Measures].[Sales Amount]=0, NULL, 0.08);

SCOPE([Measures].[Tax Amount]);
    SCOPE(
	[Customer].[Customer].[Customer].MEMBERS, 
	[Date].[Date].[Date].MEMBERS, 
	[Product].[Product].[Product].MEMBERS);
        THIS = [Measures].[Sales Amount]*[Measures].[Tax Rate];
    END SCOPE;
END SCOPE;

This time the memory usage was completely flat but the query was still so slow had to be cancelled. Next, I thought I’d try setting the NON_EMPTY_BEHAVIOR property:

CREATE MEMBER CURRENTCUBE.[Measures].[Tax Rate] AS  
0.08
, NON_EMPTY_BEHAVIOR = [Measures].[Sales Amount];

SCOPE([Measures].[Tax Amount]);
    SCOPE(
	[Customer].[Customer].[Customer].MEMBERS, 
	[Date].[Date].[Date].MEMBERS, 
	[Product].[Product].[Product].MEMBERS);
        THIS = [Measures].[Sales Amount]*[Measures].[Tax Rate];
    END SCOPE;
END SCOPE;

And guess what? The query went back to being sub-second. Strange. Now the NON_EMPTY_BEHAVIOR property isvery dangerous and this is a clear example of how it should not be set – the expression 0.08 is never going to return an empty value, regardless of the value of the [Sales Amount] measure. As a result I would be very wary of using this trick in production in case it ended up returning inconsistent results. It’s also worth noting that the following calculation, which is a correct use of NON_EMPTY_BEHAVIOR, is as slow as the other examples above:

CREATE MEMBER CURRENTCUBE.[Measures].[Tax Rate] AS  
IIF([Measures].[Sales Amount]=0, NULL, 0.08)
, NON_EMPTY_BEHAVIOR = [Measures].[Sales Amount];

Finally I created a dummy table in my DSV with one row and one column to hold the 0.08 value, and then created a real, non-calculated measure from the column. When I used this measure in my calculation then performance of my test query was also sub-second.

So, to sum up, it looks like hard-coding constant values in calculated measures is a bad idea, at least in cases like this, and either using the values themselves in your MDX calculations or creating a table and non-calculated measure specifically to hold the value is better for performance.

Recommend

美团扫码付小程序的优化实践

ARKit：增强现实技术在美团到餐业务的实践

本周值得读的15篇NLP/CV/ML论文（附开源代码）

镣铐之舞：美团安全工程师Black Hat USA演讲

Analyzing Amazon Aurora Slow Logs with pt-query-digest

前端异常监控解决方案研究

The 42M Record kayo.moe Credential Stuffing Data

标准共识：NEM，可降低用户使用区块链门槛

Python与SQL对比：处理用户记录相邻的时间差

Android 系统不释放内存吗？

About Joyk