5

Query Performance vs Query Logging

 3 years ago
source link: https://www.cubecoder.com/query-performance-vs-query-logging/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Query Performance vs Query Logging

November 2, 2016November 3, 2016 by TimG

I’ve been working on tuning aggregate views with a test copy of a large ASO cube to which I added some additional dimensions. Tuning aggregate views can be tricky, because aggregate views make query performance heavily dependent on the exact combination of levels being queried. And there’s always some user that comes along with an unusual query that happens to hit a combination of levels that performs particularly poorly. So when I handed the system over for front-end certification by users, I enabled query logging. By parsing the log for “worst case” query times, I could proactively monitor and then investigate any particularly nasty cases the users encountered. Unfortunately, I was being a little bit too smart for my own good.

Some users of this system are performing ad hoc queries via Smart View and the Dodeca Excel Add-In. However, many users are using highly-customized front-end application. A quirk of this front-end is that it uses Report Scripts to extract data – and even metadata. So, for example, there are member selection drop-downs which are (behind the scenes) being populated via the output of Essbase Report scripts. Here’s an example of a code snippet from one such report (anonymized, obviously):

{SUPMISSINGROWS}
<LINK(<GEN("BigDim",4) AND (<DESCENDANTS("Gen1Member")))

If you’re not familiar with Report Scripts, all this is doing is getting a list of generation 4 (equivalent to level 0 in this cube) members from BigDim which are descendants of Gen1Member and suppressing any which have no data. In this case it returned a handful of members.

Partway through testing users reported that the screens which used these queries in the background were taking a very long time to open. The front-end developers sent over the problem code and I started my own testing. Sure enough, the query took much longer on my test cube than it did in production; ~20 seconds vs ~1 second. No problem, I thought – I’ll create an aggregate view specifically to support this query. Doing that is the topic of another post, but it’s a reliable if fiddly technique I have used many times in the past. I figured out the ID relating to the view I wanted, executed it against the database, and… …NOTHING! The query still ran just as slowly.

At this point I was started to get concerned. I dug in further, looking at the database stats in production vs test, the exact aggregate views on both, the cache settings, the .cfg file settings, the hardware configuration, patch builds, even the kernel queries statistic and query cost being generated by the query on each. My only progress was in figuring out that if I used LEAVES rather than the combination of DESCENDANTS, GEN and SUPMISSINGROWS to get the populated leaf members the query was much faster. But unfortunately, the front-end doesn’t support that syntax.

It was only then that I remembered my brilliant decision to enable query logging. Sure, the documentation warned me:

When query logging is enabled, queries to the database may be slower.

But I couldn’t remember ever seeing a really significant impact. I mean, all it’s doing is writing the details of the query to a flat file, right? However, I wanted to be diligent in eliminating every variable that I could, so I turned it off and restarted the app. And sure enough, test instantly performed exactly as well as production.

At this point, I started looking at the query log files themselves in more detail. I hadn’t looked at them before – why would I? Their only purpose was the capture “problem” queries, and the front-end development team had already supplied them to me. But the logs confirmed what I’d (belatedly) started to suspect the problem was.

Here’s that Report Script snippet again:

{SUPMISSINGROWS}
<LINK(<GEN("BigDim",4) AND (<DESCENDANTS("Gen1Member")))

Now this query only returns a handful of rows. But query logging doesn’t see it that way at all. “Gen1Member” actually has approaching 100,000 generation 4 (level 0) descendants. And each time that Report Script is executed, query logging is writing every last one of them to the query log file in XML format. That was where the time was going, not in processing the query itself!

I’ve learned a couple of things from this, one specific / technical and one general:

  1. Be careful with query logging – some queries / configurations will produce a lot of output, even if the actual result set is small
  2. Avoiding complacency is a constant battle. I’d never seen the supposed performance impact of query logging, so I discounted it before even looking at the logs (which already contained the information I needed, and would have set me on the right track immediately)

Posted in: ASO, Essbase | Tagged: aggregate views, ASO, Essbase


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK