Analysis Services: Solving Hierarchy Errors of Uniqueness

Multidimensional Cubes provide speed when it comes to retrieving aggregations that are important to business decisions. Being able to slice or group the measures by dimension attributes helps with a quick analysis and flexible/interactive reporting. Configuring these attributes as hierarchies has some details that are not at first obvious. The error message when these problems exist is not extremely helpful for someone new to cubes.

Let’s look at creating a Date hierarchy with Year, Quarter, Month and Day. Our cube already has measures created from a sales fact table and the dimension for date has been created without any hierarchies.


Figure 1: Attribute List for Date Dimension

The measures can be displayed in a Pivot Table in Excel. Figure 2 below shows the Sales amount sliced by Sales Territory and Year/Quarter/Month.


Figure 2: Pivot Table in Excel

Figure 2 show the Year, Quarter and Month as Rows while the Sales Territory is used as columns with Internet Sales used for Values in the Pivot Table. Users will get frustrated when they have to pick one attribute at a time when logically the hierarchy is known.

To create this hierarchy, you need to edit the date dimension in the cube.


Figure 3: Edit Date Dimension in SQL Server Data Tools

To create a hierarchy, you can drag and drop the first or top level of the hierarchy from the Attributes pane into the hierarchies’ pane. We will do this with Year at the top level.


Figure 4: Drag Year to Create New Hierarchy

To finish this hierarchy, drag the Quarter under the Year followed by the Month and Dates attributes. To complete the Hierarchy, right-click on the name Hierarchy, and select Rename. We renamed the hierarchy to Y-Q-M-D.


Figure 5: Y-Q-M-D Hierarchy Created

We can deploy this project and preview in Excel to see the effects.


Figure 6: Preview Hierarchy Y-Q-M-D in Excel

So, what is the problem at this point? Well, for performance reasons, there is a blue line under the hierarchy name in the Cube project. The message tells use to create an attribute relationship for the hierarchy. This is done by editing the date dimension using the Attribute Relationship tab.


Figure 7: Attribute Relationship Does Not Exist.

Y-Q-M-D is a natural hierarchy because a Day is in a Month that is in a Quarter that is in a Year. So, we should be able to show that in the Attribute Relationship for this Hierarchy. You can drag and drop Quarter on Year, then drag and drop Month on Quarter to accomplish this. Dates is the root attribute or key to the dimension.

clip_image016      clip_image018

Figures 8 & 9: Before and After Y-Q-M-D Hierarchy

Now, when we deploy, we get an error. The error message with the red circle and white x does not tell us the problem. The problem is in the last warning indicating that Quarter has duplicates for value 4. In order for attribute relationship to exist, the values in each have to be unique across all occurrences. The Quarter 4 (as well as 3, 2 and 1) are duplicated for every year we have in the data dimension table.


Figure 10: Deployment Failed

There are a couple of solutions for this problem, but we are only going to look at one. We are going to use multiple columns in the KeyColumn property of the Quarter and Month to create uniqueness. Then, we have to add a column to the NameColumn property in order to have something display for the multi-column KeyColumn property.


Figure 11: Changing the KeyColumn of Attribute Quarter

To do this, you have to highlight the Quarter attribute in the Attributes’ pane, then go to the properties. Find the KeyColumn and click on the ellipse. When prompted, add CalendarYear to the Key Columns list and move the Year above the Quarter (Figure 11). Do the same thing for Month, add CalendarYear to the KeyColumn.


Figure 12: NameColumn for Month Attribute

The NameColumn needs to be change from nothing to CalendarQuarter for Quarter attribute and EnglishMonthName for Month attribute (Figure 12). Re-deploy the project and the error should no longer exist and we get a Deployment Competed Successfully.


Figure 13: Deployment Completed Successfully

The use of an Attribute Relationship for natural hierarchies greatly improves the processing and retrieval of data from the cube. This also assists the aggregation builder for indexing the combination of dimension attributes needed for analysis. In the end, the cube can retrieve the aggregation from the month to get quarter or quarter values to get the year which saved retrieving details to aggregate up the hierarchy.

Cost Threshold and Max Degree of Parallelism

SQL Server has many options for configuring a database system. Most do not become apparent until some part of the system does not function “properly”. Parallelism is one of these settings. You will see Waits for Parallelism and blocking if this setting is not effective. The SQL Server instance defaults for Max Degree of Parallelism is 0 and Cost Threshold is 5. What does this mean?


Advanced SQL Server Instance Properties

Maybe it is best to explain with an example. The machine used to write this blog has 4 CPUs in one Core. So, it is possible for a query or process in SQL Server to run on all 4 CPUs (Threads) at the same time. Because the default for Max Degree of Parallelism is 0, all CPUs are used when a process or query runs in parallel. I can change this to 2 and all processes or queries can only use a maximum of 2 CPUs unless a MAXDOP is specified in the query or process to use more or less.

Let’s look at a query that runs in parallel.

USE AdventureWorks2014

SELECT sod.SalesOrderID, sod.SalesOrderDetailID, sod.LineTotal
  , p.Class, p.ListPrice, p.Name
FROM Sales.SalesOrderDetail sod
INNER JOIN Production.Product p ON sod.ProductID = p.ProductID

The SET STATISTICS XML ON will produce a second result from running the query that is the XML of the execution plan.


Adventure Works query running in parallel

By clicking on the XML output, you will get the Execution Plan for the query.


Parallel Query

The operators (Hash Match is one) with the yellow double arrows indicate a parallel process during the execution of the plan. If we add OPTION (MAXDOP 1) to the query, it will not run in parallel and we will be able to see the Cost of the query when not running in parallel. This option forces the query to run as with one CPU (Thread) without any parallelism. We could also change the Max Degree of Parallelism on the instance but this would be system wide.


SELECT sod.SalesOrderID, sod.SalesOrderDetailID, sod.LineTotal
  , p.Class, p.ListPrice, p.Name
FROM Sales.SalesOrderDetail sod
INNER JOIN Production.Product p ON sod.ProductID = p.ProductID

The following is the details of the query plan and cost. The cost of this query will be approximately 6.8.


No Parallelism with MADOP 1


Cost of Query

Now, if we change the Cost Threshold of Parallelism on the instance from 5 to 10, the cost of the query (6.8) is no longer above the threshold (10) and the query will not run in parallel even without the MAXDOP 1 option.


Change Cost Threshold and Max Degree of Parallelism

We have changed the Cost Threshold of Parallelism to 10 for the instance as well as the Max Degree of Parallelism to 2. The 2 will be an instance wide setting to not use more than 2 CPUs for a query or process unless the OPTION MAXDOP is used in the query or process. We will see this in the properties of the parallel operator in the screen below.

SELECT sod.SalesOrderID, sod.SalesOrderDetailID, sod.LineTotal
  , p.Class, p.ListPrice, p.Name
FROM Sales.SalesOrderDetail sod
INNER JOIN Production.Product p ON sod.ProductID = p.ProductID


No Parallelism with New Cost Threshold

If I change the Cost Threshold back to 5, I will get parallelism with 2 CPUs.


MAXDOP2 Shown In Hash Match Operator Properties

You can now see in the properties of the Hash Match the Actual Number of Rows has 3 threads. The 0 zero thread controls the other 2 threads – 1 and 2. 1 and 2 are the 2 CPUs used to run the query in parallel based on the instance setting of 2 for Max Degree of Parallelism.

So, in conclusion, the MAXDOP feature is for the number of CPUs available for queries or processes (like Backups) to run in parallelism. The Cost Threshold is the value of the Cost of a query that triggers the Query Processer to see if parallelism will help a query or process.