How to Perform Iteration on Sets in MDX

0
194
5 min read

 

MDX with Microsoft SQL Server 2008 R2 Analysis Services Cookbook

MDX with Microsoft SQL Server 2008 R2 Analysis Services Cookbook

More than 80 recipes for enriching your Business Intelligence solutions with high-performance MDX calculations and flexible MDX queries in this book and eBook

        Read more about this book      

(For more resources on Microsoft SQL Server, see here.)

 

Iteration is a very natural way of thinking for us humans. We set a starting point, we step into a loop, and we end when a condition is met. While we’re looping, we can do whatever we want: check, take, leave, and modify items in that set. Being able to break down the problems in steps makes us feel that we have things under control. However, by breaking down the problem, the query performance often breaks down as well. Therefore, we have to be extra careful with iterations when data is concerned.

If there’s a way to manipulate the collection of members as one item, one set, without cutting that set into small pieces and iterating on individual members, we should use it. It’s not always easy to find that way, but we should at least try.

 

Iterating on a set in order to reduce it

Getting ready

Start a new query in SSMS and check that you’re working on the right database. Then write the following query:

SELECT
{ [Measures].[Customer Count],
[Measures].[Growth in Customer Base] } ON 0,
NON EMPTY
{ [Date].[Fiscal].[Month].MEMBERS } ON 1
FROM
[Adventure Works]WHERE
( [Product].[Product Categories].[Subcategory].&[1] )

The query returns fiscal months on rows and two measures: a count of customers and their growth compared to the previous month. Mountain bikes are in slicer.

Now let’s see how we can get the number of days the growth was positive for each period.

How to do it…

Follow these steps to reduce the initial set:

  1. Create a new calculated measure in the query and name it Positive growth days.
  2. Specify that you need descendants of current member on leaves.
  3. Wrap around the FILTER() function and specify the condition which says that the growth measure should be greater than zero.
  4. Apply the COUNT() function on a complete expression to get count of days.
  5. The new calculated member’s definition should look as follows, verify that it does.

    WITH
    MEMBER [Measures].[Positive growth days] AS
    FILTER(
    DESCENDANTS([Date].[Fiscal].CurrentMember, , leaves),
    [Measures].[Growth in Customer Base] > 0
    ).COUNT

  6. Add the measure on columns.
  7. Run the query and observe if the results match the following image:

    MDX with Microsoft SQL Server 2008 R2 Analysis Services Cookbook

How it works…

The task says we need to count days for each time period and use only positive ones. Therefore, it might seem appropriate to perform iteration, which, in this case, can be performed using the FILTER() function.

But, there’s a potential problem. We cannot expect to have days on rows, so we must use the DESCENDANTS() function to get all dates in the current context.

Finally, in order to get the number of items that came up upon filtering, we use the COUNT function.

There’s more…

Filter function is an iterative function which doesn’t run in block mode, hence it will slow down the query. In the introduction, we said that it’s always wise to search for an alternative if available. Let’s see if something can be done here. A keen eye will notice a “count of filtered items” pattern in this expression. That pattern suggests the use of a set-based approach in the form of SUM-IF combination. The trick is to provide 1 for the True part of the condition taken from the FILTER() statement and null for the False part. The sum of one will be equivalent to the count of filtered items.

In other words, once rewritten, that same calculated member would look like this:

MEMBER [Measures].[Positive growth days] AS
SUM(
Descendants([Date].[Fiscal].CurrentMember, , leaves),
IIF( [Measures].[Growth in Customer Base] > 0, 1, null)
)

Execute the query using the new definition. Both the SUM() and the IIF() functions are optimized to run in the block mode, especially when one of the branches in IIF() is null. In this particular example, the impact on performance was not noticeable because the set of rows was relatively small. Applying this technique on large sets will result in drastic performance improvement as compared to the FILTER-COUNT approach. Be sure to remember that in future.

More information about this type of optimization can be found in Mosha Pasumansky’s blog:

http://tinyurl.com/SumIIF

Hints for query improvements

There are several ways you can avoid the FILTER() function in order to improve performance.

  • When you need to filter by non-numeric values (i.e. properties or other metadata), you should consider creating an attribute hierarchy for often-searched items and then do one of the following:
    • Use a tuple when you need to get a value sliced by that new member
    • Use the EXCEPT() function when you need to negate that member on its own hierarchy (NOT or <>)
    • Use the EXISTS() function when you need to limit other hierarchies of the same dimension by that member
    • Use the NONEMPTY() function when you need to operate on other dimensions, that is, subcubes created with that new member
    • Use the 3-argument EXISTS() function instead of the NONEMPTY() function if you also want to get combinations with nulls in the corresponding measure group (nulls are available only when the NullProcessing property for a measure is set to Preserve)
  • When you need to filter by values and then count a member in that set, you should consider aggregate functions like SUM() with IIF() part in its expression, as described earlier.

 

LEAVE A REPLY

Please enter your comment!
Please enter your name here