MDX calculation performance

Optimizing the performance of the Storage Engine is relatively straightforward: you can diagnose performance problems easily and you only have two options—partitioning and aggregation—for solving them. Optimizing the performance of the Formula Engine is much more complicated because it requires knowledge of MDX, diagnosing performance problems is difficult because the internal workings of the Formula Engine are hard to follow, and solving the problem is reliant on knowing tips and tricks that may change from service pack to service pack.

Diagnosing Formula Engine performance problems

If you have a poorly-performing query, and if you can rule out the Storage Engine as the cause of the problem, then the issue is with the Formula Engine. We've already seen how we can use Profiler to check the performance of Query Subcube events, to see which partitions are being hit and to check whether aggregations are being used; if you subtract the sum of the durations of all the Query Subcube events from the duration of the query as a whole, you'll get the amount of time spent in the Formula Engine. You can use MDX Studio's Profile functionality to do the same thing much more easily—here's a screenshot of what it outputs when a calculation-heavy query is run:

query-performance-tuning-microsoft-analysis-services-part-2-img-0

The following blog entry describes this functionality in detail: http://tinyurl.com/mdxtrace; but what this screenshot displays is essentially the same thing that we'd see if we ran a Profiler trace when running the same query on a cold and warm cache, but in a much more easy-to-read format. The column to look at here is the Ratio to Total, which shows the ratio of the duration of each event to the total duration of the query. We can see that on both a cold cache and a warm cache the query took almost ten seconds to run but none of the events recorded took anywhere near that amount of time: the highest ratio to parent is 0.09%. This is typical of what you'd see with a Formula Engine-bound query.

Another hallmark of a query that spends most of its time in the Formula Engine is that it will only use one CPU, even on a multiple-CPU server. This is because the Formula Engine, unlike the Storage Engine, is single-threaded. As a result if you watch CPU usage in Task Manager while you run a query you can get a good idea of what's happening internally: high usage of multiple CPUs indicates work is taking place in the Storage Engine, while high usage of one CPU indicates work is taking place in the Formula Engine.

Calculation performance tuning

Having worked out that the Formula Engine is the cause of a query's poor performance then the next step is, obviously, to try to tune the query. In some cases you can achieve impressive performance gains (sometimes of several hundred percent) simply by rewriting a query and the calculations it depends on; the problem is knowing how to rewrite the MDX and working out which calculations contribute most to the overall query duration. Unfortunately Analysis Services doesn't give you much information to use to solve this problem and there are very few tools out there which can help either, so doing this is something of a black art.

There are three main ways you can improve the performance of the Formula Engine: tune the structure of the cube it's running on, tune the algorithms you're using in your MDX, and tune the implementation of those algorithms so they use functions and expressions that Analysis Services can run efficiently. We've already talked in depth about how the overall cube structure is important for the performance of the Storage Engine and the same goes for the Formula Engine; the only thing to repeat here is the recommendation that if you can avoid doing a calculation in MDX by doing it at an earlier stage, for example in your ETL or in your relational source, and do so without compromising functionality, you should do so. We'll now go into more detail about tuning algorithms and implementations.

Mosha Pasumansky's blog, http://tinyurl.com/moshablog, is a goldmine of information on this subject. If you're serious about learning MDX we recommend that you subscribe to it and read everything he's ever written.

Tuning algorithms used in MDX

Tuning an algorithm in MDX is much the same as tuning an algorithm in any other kind of programming language—it's more a matter of understanding your problem and working out the logic that provides the most efficient solution than anything else. That said, there are some general techniques that can be used often in MDX and which we will walk through here.

Using named sets to avoid recalculating set expressions

Many MDX calculations involve expensive set operations, a good example being rank calculations where the position of a tuple within an ordered set needs to be determined. The following query includes a calculated member that displays Dates on the Rows axis of a query, and on columns shows a calculated measure that returns the rank of that date within the set of all dates based on the value of the Internet Sales Amount measure:

WITH
  MEMBER MEASURES.MYRANK AS
    Rank
    (
	[Date].[Date].CurrentMember
	,Order
        (
	  [Date].[Date].[Date].MEMBERS
	  ,[Measures].[Internet Sales Amount]
	  ,BDESC
	)
     )
SELECT
  MEASURES.MYRANK ON 0
  ,[Date].[Date].[Date].MEMBERS ON 1
  FROM [Adventure Works]

It runs very slowly, and the problem is that every time the calculation is evaluated it has to evaluate the Order function to return the set of ordered dates. In this particular situation, though, you can probably see that the set returned will be the same every time the calculation is called, so it makes no sense to do the ordering more than once. Instead, we can create a named set hold the ordered set and refer to that named set from within the calculated measure, so:

WITH
  SET ORDEREDDATES AS
    Order
    (
	[Date].[Date].[Date].MEMBERS
	,[Measures].[Internet Sales Amount]
	,BDESC
     )
    MEMBER MEASURES.MYRANK AS
     Rank
     (
       [Date].[Date].CurrentMember
       ,ORDEREDDATES
     )
     SELECT
      MEASURES.MYRANK ON 0
      ,[Date].[Date].[Date].MEMBERS ON 1
      FROM [Adventure Works]

This version of the query is many times faster, simply as a result of improving the algorithm used; the problem is explored in more depth in this blog entry: http://tinyurl.com/mosharank

Since normal named sets are only evaluated once they can be used to cache set expressions in some circumstances; however, the fact that they are static means they can be too inflexible to be useful most of the time. Note that normal named sets defined in the MDX Script are only evaluated once, when the MDX script executes and not in the context of any particular query, so it wouldn't be possible to change the example above so that the set and calculated measure were defined on the server. Even named sets defined in the WITH clause are evaluated only once, in the context of the WHERE clause, so it wouldn't be possible to crossjoin another hierarchy on columns and use this approach, because for it to work the set would have to be reordered once for each column.

The introduction of dynamic named sets in Analysis Services 2008 improves the situation a little, and other more advanced techniques can be used to work around these issues, but in general named sets are less useful than you might hope. For further reading on this subject see the following blog posts:

http://tinyurl.com/chrisrank
http://tinyurl.com/moshadsets
http://tinyurl.com/chrisdsets

Using calculated members to cache numeric values

In the same way that you can avoid unnecessary re-evaluations of set expressions by using named sets, you can also rely on the fact that the Formula Engine can (usually) cache the result of a calculated member to avoid recalculating expressions which return numeric values. What this means in practice is that anywhere in your code you see an MDX expression that returns a numeric value repeated across multiple calculations, you should consider abstracting it to its own calculated member; not only will this help performance, but it will improve the readability of your code. For example, take the following slow query which includes two calculated measures:

WITH
  MEMBER [Measures].TEST1 AS
    [Measures].[Internet Sales Amount]
   /
    Count
    (
      TopPercent
      (
        {
	  [Scenario].[Scenario].&[1]
	  ,[Scenario].[Scenario].&[2]
	}*
	 [Account].[Account].[Account].MEMBERS*
	 [Date].[Date].[Date].MEMBERS
        ,10
        ,[Measures].[Amount]
      )
    )
MEMBER [Measures].TEST2 AS
   [Measures].[Internet Tax Amount]
 /
   Count
   (
     TopPercent
     (
        {
	  [Scenario].[Scenario].&[1]
	  ,[Scenario].[Scenario].&[2]
	}*
	[Account].[Account].[Account].MEMBERS*
	[Date].[Date].[Date].MEMBERS*
	[Department].[Departments].[Department Level 02].MEMBERS
       ,10
       ,[Measures].[Amount]
      )
     )
SELECT
  {
     [Measures].TEST1
    ,[Measures].TEST2
   } ON 0
   ,[Customer].[Gender].[Gender].MEMBERS ON 1
FROM [Adventure Works]

A quick glance over the code shows that a large section of it occurs twice in both calculations—everything inside the Count function. If we remove that code to its own calculated member as follows:

WITH
  MEMBER [Measures].Denominator AS
    Count
    (
     TopPercent
      (
	{
	  [Scenario].[Scenario].&[1]
	 ,[Scenario].[Scenario].&[2]
	}*
	 [Account].[Account].[Account].MEMBERS*
	 [Date].[Date].[Date].MEMBERS
        ,10
        ,[Measures].[Amount]
      )
     )
MEMBER [Measures].TEST1 AS
  [Measures].[Internet Sales Amount] / [Measures].Denominator
MEMBER [Measures].TEST2 AS
  [Measures].[Internet Tax Amount] / [Measures].Denominator
SELECT
   {
	[Measures].TEST1
	,[Measures].TEST2
    } ON 0
 ,[Customer].[Gender].[Gender].MEMBERS ON 1
FROM [Adventure Works]

The query runs much faster, simply because instead of evaluating the count twice for each of the two visible calculated measures, we evaluate it once, cache the result in the calculated measure Denominator and then reference this in the other calculated measures.

It's also possible to find situations where you can rewrite code to avoid evaluating a calculation that always returns the same result over different cells in the multidimensional space of the cube. This is much more difficult to do effectively though; the following blog entry describes how to do it in detail: http://tinyurl.com/fecache

Tuning the implementation of MDX

Like just about any other software product, Analysis Services is able to do some things more efficiently than others. It's possible to write the same query or calculation using the same algorithm but using different MDX functions and see a big difference in performance; as a result, we need to know which are the functions we should use and which ones we should avoid. Which ones are these though? Luckily MDX Studio includes functionality to analyse MDX code and flag up such problems—to do this you just need to click the Analyze button—and there's even an online version of MDX Studio that allows you to do this too, available at: http://mdx.mosha.com/. We recommend that you run any MDX code you write through this functionality and take its suggestions on board. Mosha walks through an example of using MDX Studio to optimise a calculation on his blog here: http://tinyurl.com/moshaprodvol

Block computation versus cell-by-cell
When the Formula Engine has to evaluate an MDX expression for a query it can basically do so in one of two ways. It can evaluate the expression for each cell returned by the query, one at a time, an evaluation mode known as "cell-by-cell"; or it can try to analyse the calculations required for the whole query and find situations where the same expression would need to be calculated for multiple cells and instead do it only once, an evaluation mode known variously as "block computation" or "bulk evaluation". Block computation is only possible in some situations, depending on how the code is written, but is often many times more efficient than cell-by-cell mode. As a result, we want to write MDX code in such a way that the Formula Engine can use block computation as much as possible, and when we talk about using efficient MDX functions or constructs then this is what we in fact mean. Given that different calculations in the same query, and different expressions within the same calculation, can be evaluated using block computation and cell-by-cell mode, it’s very difficult to know which mode is used when. Indeed in some cases Analysis Services can’t use block mode anyway, so it’s hard know whether we have written our MDX in the most efficient way possible. One of the few indicators we have is the Perfmon counter MDXTotal Cells Calculated, which basically returns the number of cells in a query that were calculated in cell-by-cell mode; if a change to your MDX increments this value by a smaller amount than before, and the query runs faster, you're doing something right.

The list of rules that MDX Studio applies is too long to list here, and in any case it is liable to change in future service packs or versions; another good guide for Analysis Services 2008 best practices exists in the Books Online topic Performance Improvements for MDX in SQL Server 2008 Analysis Services, available online here: http://tinyurl.com/mdximp. However, there are a few general rules that are worth highlighting:

Don't use the Non_Empty_Behavior calculation property in Analysis Services 2008, unless you really know how to set it and are sure that it will provide a performance benefit. It was widely misused with Analysis Services 2005 and most of the work that went into the Formula Engine for Analysis Services 2008 was to ensure that it wouldn't need to be set for most calculations. This is something that needs to be checked if you're migrating an Analysis Services 2005 cube to 2008.
Never use late binding functions such as LookupCube, or StrToMember or StrToSet without the Constrained flag, inside calculations since they have a serious negative impact on performance. It's almost always possible to rewrite calculations so they don't need to be used; in fact, the only valid use for StrToMember or StrToSet in production code is when using MDX parameters. The LinkMember function suffers from a similar problem but is less easy to avoid using it.
Use the NonEmpty function wherever possible; it can be much more efficient than using the Filter function or other methods. Never use NonEmptyCrossjoin either: it's deprecated, and everything you can do with it you can do more easily and reliably with NonEmpty.
Lastly, don't assume that whatever worked best for Analysis Services 2000 or 2005 is still best practice for Analysis Services 2008. In general, you should always try to write the simplest MDX code possible initially, and then only change it when you find performance is unacceptable. Many of the tricks that existed to optimise common calculations for earlier versions now perform worse on Analysis Services 2008 than the straightforward approaches they were designed to replace.

Caching

We've already seen how Analysis Services can cache the values returned in the cells of a query, and how this can have a significant impact on the performance of a query. Both the Formula Engine and the Storage Engine can cache data, but may not be able to do so in all circumstances; similarly, although Analysis Services can share the contents of the cache between users there are several situations where it is unable to do so. Given that in most cubes there will be a lot of overlap in the data that users are querying, caching is a very important factor in the overall performance of the cube and as a result ensuring that as much caching as possible is taking place is a good idea.

Formula cache scopes

There are three different cache contexts within the Formula Engine, which relate to how long data can be stored within the cache and how that data can be shared between users:

Query Context, which means that the results of calculations can only be cached for the lifetime of a single query and so cannot be reused by subsequent queries or by other users.
Session Context, which means the results of calculations are cached for the lifetime of a session and can be reused by subsequent queries in the same session by the same user.
Global Context, which means the results of calculations are cached until the cache has to be dropped because data in the cube has changed (usually when some form of processing takes place on the server). These cached values can be reused by subsequent queries run by other users as well as the user who ran the original query.

Clearly the Global Context is the best from a performance point of view, followed by the Session Context and then the Query Context; Analysis Services will always try to use the Global Context wherever possible, but it is all too easy to accidentally write queries or calculations that force the use of the Session Context or the Query Context. Here's a list of the most important situations when that can happen:

If you define any calculations (not including named sets) in the WITH clause of a query, even if you do not use them, then Analysis Services can only use the Query Context (see http://tinyurl.com/chrisfecache for more details).
If you define session-scoped calculations but do not define calculations in the WITH clause, then the Session Context must be used.
Using a subselect in a query will force the use of the Query Context (see http://tinyurl.com/chrissubfe).
Use of the CREATE SUBCUBE statement will force the use of the Session Context.
When a user connects to a cube using a role that uses cell security, then the Query Context will be used.
When calculations are used that contain non-deterministic functions (functions which could return different results each time they are called), for example the Now() function that returns the system date and time, the Username() function or any Analysis Services stored procedure, then this forces the use of the Query Context.

Other scenarios that restrict caching

Apart from the restrictions imposed by cache context, there are other scenarios where caching is either turned off or restricted.

When arbitrary-shaped sets are used in the WHERE clause of a query, no caching at all can take place in either the Storage Engine or the Formula Engine. An arbitrary-shaped set is a set of tuples that cannot be created by a crossjoin, for example:

({([Customer].[Country].&[Australia], [Product].[Category].&[1]),
([Customer].[Country].&[Canada], [Product].[Category].&[3])})

If your users frequently run queries that use arbitrary-shaped sets then this can represent a very serious problem, and you should consider redesigning your cube to avoid it. The following blog entries discuss this problem in more detail:

http://tinyurl.com/tkarbset
http://tinyurl.com/chrisarbset

Even within the Global Context, the presence of security can affect the extent to which cache can be shared between users. When dimension security is used the contents of the Formula Engine cache can only be shared between users who are members of roles which have the same permissions. Worse, the contents of the Formula Engine cache cannot be shared between users who are members of roles which use dynamic security at all, even if those users do in fact share the same permissions.

Cache warming

Since we can expect many of our queries to run instantaneously on a warm cache, and the majority at least to run faster on a warm cache than on a cold cache, it makes sense to preload the cache with data so that when users come to run their queries they will get warm-cache performance. There are two basic ways of doing this, running CREATE CACHE statements and automatically running batches of queries.

Create Cache statement

The CREATE CACHE statement allows you to load a specified subcube of data into the Storage Engine cache. Here's an example of what it looks like:

CREATE CACHE FOR [Adventure Works] AS
({[Measures].[Internet Sales Amount]}, [Customer].[Country].[Country].
MEMBERS,
[Date].[Calendar Year].[Calendar Year].MEMBERS)

More detail on this statement can be found here: http://tinyurl.com/createcache

CREATE CACHE statements can be added to the MDX Script of the cube so they execute every time the MDX Script is executed, although if the statements take a long time to execute (as they often do) then this might not be a good idea; they can also be run after processing has finished from an Integration Services package using an Execute SQL task or through ASCMD, and this is a much better option because it means you have much more control over when the statements actually execute—you wouldn't want them running every time you cleared the cache, for instance.

Running batches of queries

The main drawback of the CREATE CACHE statement is that it can only be used to populate the Storage Engine cache, and in many cases it's warming the Formula Engine cache that makes the biggest difference to query performance. The only way to do this is to find a way to automate the execution of large batches of MDX queries (potentially captured by running a Profiler trace while users go about their work) that return the results of calculations and so which will warm the Formula Engine cache. This automation can be done in a number of ways, for example by using the ASCMD command line utility which is part of the sample code for Analysis Services that Microsoft provides (available for download here: http://tinyurl.com/sqlprodsamples); another common option is to use an Integration Services package to run the queries, as described in the following blog entries— http://tinyurl.com/chriscachewarm and http://tinyurl.com/allancachewarm

This approach is not without its own problems, though: it can be very difficult to make sure that the queries you're running return all the data you want to load into cache, and even when you have done that, user query patterns change over time so ongoing maintenance of the set of queries is important.

Scale-up and scale-out

Buying better or more hardware should be your last resort when trying to solve query performance problems: it's expensive and you need to be completely sure that it will indeed improve matters. Adding more memory will increase the space available for caching but nothing else; adding more or faster CPUs will lead to faster queries but you might be better off investing time in building more aggregations or tuning your MDX. Scaling up as much as your hardware budget allows is a good idea, but may have little impact on the performance of individual problem queries unless you badly under-specified your Analysis Services server in the first place.

If your query performance degenerates as the number of concurrent users running queries increases, consider scaling-out by implementing what's known as an OLAP farm. This architecture is widely used in large implementations and involves multiple Analysis Services instances on different servers, and using network load balancing to distribute user queries between these servers. Each of these instances needs to have the same database on it and each of these databases must contain exactly the same data in it for queries to be answered consistently. This means that, as the number of concurrent users increases, you can easily add new servers to handle the increased query load. It also has the added advantage of removing a single point of failure, so if one Analysis Services server fails then the others take on its load automatically.

Making sure that data is the same across all servers is a complex operation and you have a number of different options for doing this: you can either use the Analysis Services database synchronisation functionality, copy and paste the data from one location to another using a tool like Robocopy, or use the new Analysis Services 2008 shared scalable database functionality. The following white paper from the SQLCat team describes how the first two options can be used to implement a network load-balanced solution for Analysis Services 2005: http://tinyurl.com/ssasnlb.

Shared scalable databases have a significant advantage over synchronisation and file-copying in that they don't need to involve any moving of files at all. They can be implemented using the same approach described in the white paper above, but instead of copying the databases between instances you process a database (attached in ReadWrite mode) on one server, detach it from there, and then attach it in ReadOnly mode to one or more user-facing servers for querying while the files themselves stay in one place. You do, however, have to ensure that your disk subsystem does not become a bottleneck as a result.

Summary

In this article we covered MDX calculation performance and caching, and also how to write MDX to ensure that the Formula Engine works as efficiently as possible. We've also seen how important caching is to overall query performance and what we need to do to ensure that we can cache data as often as possible, and we've discussed how to scale-out Analysis Services using network load balancing to handle large numbers of concurrent users.